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Convergence of Fundamental Limitations in Feedback Communication, Estimation, and 

Feedback Control over Gaussian Channels 

Jialing Liu and Nicola Elia 
Abstract 

In this paper, we establish the connections of the fundamental limitations in feedback communication, 
estimation, and feedback control over Gaussian channels, from a unifying perspective for information, 
estimation, and control. The optimal feedback communication system over a Gaussian necessarily 
employs the Kalman filter (KF) algorithm, and hence can be transformed into an estimation system 
and a feedback control system over the same channel. This follows that the information rate of the 
communication system is alternatively given by the decay rate of the Cramer-Rao bound (CRB) of the 
estimation system and by the Bode integral (BI) of the control system. Furthermore, the optimal tradeoff 
between the channel input power and information rate in feedback communication is alternatively 
characterized by the optimal tradeoff between the (causal) one-step prediction mean-square error (MSE) 
and (anti-causal) smoothing MSE (of an appropriate form) in estimation, and by the optimal tradeoff 
between the regulated output variance with causal feedback and the disturbance rejection measure (BI 
or degree of anti-causality) in feedback control. All these optimal tradeoffs have an interpretation as 
the tradeoff between causality and anti-causality. Utilizing and motivated by these relations, we provide 
several new results regarding the feedback codes and information theoretic characterization of KF. 
Finally, the extension of the finite-horizon results to infinite horizon is briefly discussed under specific 
dimension assumptions (the asymptotic feedback capacity problem is left open in this paper). 

Keywords: Fundamental limitations; Gaussian channels with memory; confluence of feedback communication, 
estimation, and feedback control; Kalman filtering (KF); minimum mean-square error (MMSE); Bode integral (BI); 
smoothing, filtering, and prediction; causality versus anti-causality; Cover-Pombra coding structure; Schalkwijk- 
Kailath scheme; cheap control 

I. Introduction 

Communication systems in which the transmitters have access to noiseless feedback of channel outputs have 
been widely studied. The fundamental limitations in these systems, i.e. the feedback capacities, and the capacity- 
achieving codes, have been a central focus in the information theoretic literature. As one of the most important 
case, the single-input single-output Gaussian channels with noiseless feedback have attracted considerable attention; 
see [1]— [15] and references therein for the capacity characterization and coding scheme design for these channels. 
There exist different approaches in addressing the fundamental limitations for such channels, categorized roughly 
(by no means strict as the approaches are intrinsically related) as follows: 1) Estimation theory related approaches, 
which utilizes concepts such as maximum likelihood (ML) or minimum mean-square error (MMSE) estimates 
in constructing the coding schemes (cf. e.g. [1], [2], [4], [5], [7], [16]); 2) Information theoretic approaches, 
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most notably the Cover-Pombra formulation based on the asymptotic equipartition (AEP) property and the mutual 
information between the message and the channel outputs (cf. e.g. [6k [9], [15]), and the directed information 
formulation based on the input-output characterization of the channels [j (cf. e.g. [11], [17], [18]); and 3) Control 
theory related approaches, which regards the feedback communication problems as optimal control problems (cf. 
e.g. [3], [11]-[14]). 

In particular, Schalkwijk and Kailath [1], [2] proposed the Schalkwijk-Kailath (SK) codes for additive white 
Gaussian noise (AWGN) channels, achieving the asymptotic feedback capacity (i.e. the infinite-horizon feedback 
capacity, denoted C^, which is the highest information rate over the time spans between and infinity, subject to an 
average power constraint) and greatly reduce the coding complexity and coding delay. The SK codes were suggested 
by the Robbins-Monro stochastic approximation and recursive ML algorithm which have an estimation theoretic 
flavor. Along the line of [1], [2], Butman, Ozarow, and numerous other researchers have proposed extensions of 
the SK codes to Gaussian feedback channels with memory and obtained tight capacity bounds, see e.g. [4], [5], 
[7]. 

Cover and Pombra [6] introduced a general coding structure (called the Cover-Pombra structure, or the CP 
structure for short) to achieve the finite-horizon feedback capacity (denoted Ct, the highest information rate over 
the time span between and T subject to an average power constraint) for Gaussian channels with memory, 
based on classical information theoretic concepts. Their development builds on the mutual information between 
the message and the channel outputs (hence circumventing the causality issue pointed out by Massey [17] without 
appealing to directed information) and AEP for arbitrary Gaussian processes. The CP structure was initially regarded 
to have prohibitive computation complexity if the coding length (T + 1) is large (see, however, Section HV-AI for 
more detailed discussion), and efforts have been made to reduce the complexity and to refine the CP structure. By 
exploiting the special properties of a moving-average Gaussian channel with feedback, Ordentlich [9] discovered 
the finite rank property of the innovations in the CP structure, which reduces the computation complexity. Shahar- 
Doron and Feder [10] reformulated the CP structure along this direction, and obtained an SK-based coding scheme 
to achieve Ct with reduced computation complexity. Furthermore, utilizing the CP structure as a starting point, 
Kim [15] proved that a closed-form expression^ of the asymptotic capacity for an first-order moving-average 
Gaussian channel with feedback, and obtained an SK-based coding scheme to achieve C^. This is the first 
Gaussian channel with memory (except for the degenerated case of AWGN channel) that has an established 
asymptotic feedback capacity and available capacity-achieving codes, to the best of our knowledge. On the other 
hand, Vandenberghe et al [19] showed that the computation of Ct based on the CP structure can be reformulated 
as a convex optimization problem. 

Tatikonda and Mitter [11], [18] provided an extensive study of feedback communication systems and their 
capacities. They extended the notion of directed information proposed in [17] and proved that its supremum 
equals the operational capacity; reformulated the problem of computing Ct as a stochastic control optimization 
problem; and proposed a dynamic programming based solution and characterized the sufficient statistics required 
for encoding and decoding. This idea was further explored in [12] by Yang et al, which uncovered the Markov 
property of the optimal input distributions for Gaussian channels with memory, established a class of refined, finite- 

1 The directed information in feedback communication systems may be viewed as the causal counterpart of mutual information 
used in communication systems without feedback, the supremum of which (under applicable constraints, if any) is the capacity. 
See also Appendix [TT1 

2 This expression was initially identified by Elia [14] and Yang et al [12] and has been conjectured to be Coo; however, a 
rigorous proof was not available until Kim [15]. 

3 By Gaussian channels with memory, researchers normally refer to frequency-selective Gaussian channels, including Gaussian 
channels with inter-symbol interference (ISI) and channels with colored Gaussian noise, a convention also adopted in this paper 
(although some other Gaussian channels may also have memory). The Gaussian channels with memory may sometimes be 
referred to as general Gaussian channels (in contrast to the specific AWGN channels), or even simply as Gaussian channels. 
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dimensional optimal input distributions, and eventually reduced the finite-horizon stochastic control optimization 
problem to a manageable size (with complexity 0(T)). Moreover, under a stationarity conjecture that Coo can be 
achieved by a stationary input process, Coo is given by the solution of a finite-dimensional optimization problem. 
This is the first computationally efficient Q method to calculate the feedback capacity in infinite horizon for general 
Gaussian channels. A Kalman filter (KF) was used in [12] to generate the sufficient statistics of the output feedback. 

Omura [3] identified a stochastic optimal control problem for feedback communication systems. Omura showed 
that the solution to the control problem is optimal for AWGN channels in the sense of achieving the capacity; 
however, how this approach might be extended to achieve the capacities of more general channels remained to 
be seen|fl Sahai and Mitter [13], [20] investigated the problem of tracking unstable sources over a channel and 
introduced the notion of anytime capacity to capture the fundamental limitations in that problem, which again reveals 
connections between communication and control and brings various new insights to feedback communication 
problems. Furthermore, Elia [14] established the equivalence between reliable communication and stabilization 
over Gaussian channels with memory, showed that the achievable transmission rate is given by the Bode sensitivity 
integral of the associated control system, and presented an optimization problem based on robust control to compute 
lower bounds of Coo- These lower bounds can be achieved by generalized SK codes that have an interpretation of 
tracking unstable sources over Gaussian channels. For a time-varying fading AWGN channel whose fade is modelled 
as a Markov process with channel output feedback and channel state information (CSI), a control-oriented coding 
scheme multiplexing across multiple subsystems according to CSI was constructed by Liu et al [21] to achieve 
the ergodic capacity, and it is shown to be an extension of the SK codes to time-varying channels with appropriate 
channel state information. For a recent survey of various topics on feedback communication, see e.g. [15], [22] 
and references therein. 

As we have seen, different approaches have been shown useful in addressing the Gaussian feedback communi- 
cation problem. This paper attempts to present a converging point: We study the Gaussian channels with feedback 
from a perspective that unifies information, estimation, and control, which encompasses many of the existing 
approaches scattered in the literature. We demonstrate that the feedback communication problem over a Gaussian 
channel can be reformulated as an optimal estimation problem or an optimal control problem. In fact, we show 
that the existing coding structures either necessarily contain Kalman filters or are reformulations of Kalman filters: 
The CP structure necessitates a KF in order to be optimal, the SK code can be easily obtained or extended by 
transforming a KF, and the control-oriented schemes can be derived from a KF by the duality between control 
and estimation [23]. As a result, the fundamental limitations in feedback communication, estimation, and feedback 
control coincide. 

Particularly, the achievable rate of the feedback communication system is alternatively given by the decay rate of 
the Cramer-Rao bound (CRB) for the associated estimation system as well as the Bode integral (BI) of the associated 
control system. In addition, the fundamental limitations in terms of the optimal tradeoffs in feedback communication, 
estimation, and feedback control coincide, all of which may be interpreted as the tradeoff between causality and 
anti-causality. In feedback communication, this fundamental limitation is the optimal tradeoff between the input 
power and information rate. Alternatively in the associated estimation system, it can be characterized by the optimal 
tradeoff between the (causal) one-step prediction and (anti-causal) smoothing, or in the associated control system 
by the optimal tradeoff between the variance of a regulated output (generated using causal feedback) and the BI (or 
degree of anti-causality or instability). That is, the optimal pairs (P, R), (PMMSE T , (logdetMMSE; r 1 )/(2T + 2)), 

4 Here we do not mean that their optimization problem is convex. The computation complexity associated with the optimization 
problem is determined mainly on the channel order which does not grow to infinity as the time horizon increases to infinity. 

5 Rather than showing the feedback capacity problem can be posed as a control problem as Tatikonda and Mitter did, Omura 
formulated the control problem to minimize MMSE. Whether this may yield information theoretic optimality was not explored 
by Omura [3] except for the AWGN case. Later works such as [11], [12], [14], [20] and the present paper have established 
results on the intrinsic relationship between communication and control within a more general framework. 
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and (P u , log DI) correspond to each other, where P is the average channel input power and R is the average 
information rate in the communication system; PMMSEt is the time average of the one-step prediction MMSE of 
the to-be-estimated process in the estimation system and MMSEy is the anti-causal smoothing MMSE of the initial 
state of the process; P u is the variance of the regulated output u (i.e. control performance measure) in the control 
system and DI is the degree of instability of the open-loop system defined as the product of open-loop unstable 
eigenvalues and is equal to the Bode sensitivity integral (i.e. disturbance rejection measure). Here the tradeoffs 
mean that if one wishes to keep the first element in the pair small (such as low channel input power), the other 
element cannot be made arbitrarily large. See Sec. IVII-CI for more precise descriptions. We call DI the degree 
of anti-causality since it is associated with right-half plane (RHP) poles. Note that references exist in addressing 
various aspects of fundamental limits; for an incomplete list, Van Trees [24] (pp. 501-511), de Bruijn, and Guo et 
al [25] (and therein references and subsequent works) discussed filtering versus smoothing as well as their relation 
to entropy and mutual information, Feng et al [26] examined the KF MMSE performance related to information 
theoretic measures, Iglesias and coauthors [27], [28] studied BI and its information theoretic interpretation, Seron 
et al [29] presented connections of the fundamental limitations between control and filtering, Martins and Dahleh 
[30], [31] studied BI and entropy rates for systems over communication channels. See also [11], [20] and more 
discussions in Sec. IVII-AI 

Utilizing or motivated by the above mentioned equivalence relationship, we provide 1) New refinements to 
the Cover-Pombra capacity-achieving coding structure, including the complete characterization of the feedback 
generator; the necessity of KF in the CP structure; the orthogonality between future channel inputs and past 
channel outputs; the Gauss-Markov property of the transformed channel outputs; and the finite-dimensionality of 
the optimal message-carrying inputs. 2) Simple equivalence between generalized Schalkwijk-Kailath codes and the 
KF, which yields a convenient way to obtain a feedback communication scheme from an estimation problem. 3) 
Information theoretic characterization of KF; that is, the KF is not only a device to provide sufficient statistics 
(which was shown in [12]), but also a device to ensure the power efficiency and to recover the message optimally. 
4) The necessity of MMSE estimation in feedback communication problems over general additive noise channels 
with an average power constraint. Our results 1) - 3) hold for AWGN channels with intersymbol interference (ISI) 
where the ISI is modelled as a stable and minimum-phase FDLTI system; through the equivalence shown in [11], 
[12], this channel is equivalent to a colored Gaussian channel with a rational noise power spectrum (which is 
assumed in a number of references) and without ISI. The above results are mainly derived in the finite horizon, 
but we also show that the KF converges to a steady state as time goes to infinity, and the equivalence holds in the 
steady state system as well. Note that, however, the infinite-horizon feedback capacity (or the stationary feedback 
capacity) problem is left open in this paper B 

This paper is organized as follows. In Section [II] a motivating example of feedback communication over an 
AWGN channel is presented. In Section[lII] we describe the general Gaussian channel models. We then introduce the 
feedback capacity in finite horizon and the CP structure in Section [IV] In Section [V] we consider a general coding 
structure in finite-horizon which is closely related to the CP structure but allows us to easily see the necessity of 
the KF algorithm in feedback communication. The presence of the KF links the feedback communication problem 
to an estimation problem and a control problem as shown in Section [VI] and hence we rewrite the information rate 
and input power in terms of estimation theory quantities and control theory quantities and explore the connections; 
see Section [VTll More necessary conditions for the optimality of the coding structure are proposed in Section [Villi 
Sections [V] to IVIIII are focused on finite horizon. In Section [IX] we extend the horizon to infinity and characterize 
the steady-state behavior. 

Notations: We use underlines to specify vectors, and use boldface to specify matrices. To ease the reading, all 

6 We note that Kim in [32] and further in [33] claims the stationary conjecture is verified. This leads to that stationary 
feedback capacity equals the asymptotic feedback capacity. 
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vectors in this paper are column vectors. We represent transpose by '. We represent time indices by subscripts, 
such as yt. We denote by y T the collection {yo,yi, ■ ■ ■ , yr}, and {y t } the sequence {yt}^Q- We assume that 
the starting time of all processes is 0, consistent with the convention in dynamical systems but different from the 
information theory literature. We use h(X) for the differential entropy of the random variable X. For a random 
vector y T , we denote its co variance matrix as The norm ||y|| is the Euclidean norm of the vector. We 

denote T xy (z) as the transfer function from x to y. As a linear input-output relation (linear system) Z(z) can be 
alternatively captured by a matrix, we represent the matrix associated with linear system Z(z) by Z(z) (boldface 
script Z). We denote "defined to be" as ":=". We use {A,B_,Cf,D) to represent system 

f m+i = Ax t + But 
\ Vt = <2'x t + Du t . 

Finally, in this paper, by "capacity" we refer to the feedback capacity, if not specified otherwise. 

II. Motivating example: feedback capacity and optimal schemes for an AWGN 

CHANNEL 

To help the reader understand the intuition behind our study, we present a simple example over an AWGN 
channel before we go into the Gaussian channels with memory. Below, we introduce a simple KF system (see Fig. 
[T|(a)), followed by a straightforward rewrite of it (see Fig. [U(b)), which now has an interpretation as a feedback 
communication system. Finally we show that this feedback communication system is optimal as it is equivalent to 
the optimal SK scheme. It motivates the further exploration of the connections among feedback communication, 
estimation, and feedback control. 

1) A Kalman Filter Problem: Consider a standard KF problem for a first-order unstable LTI system with 
noisy measurements: 

{x t +i = ax t 
r t = cx t (2) 
Vt = n+N t , 

where xq is unknown, a > 1 (namely the system is unstable), a and c are known, and N t 1 '~ ' jV(0, 1). The 
KF provides MMSE estimate of {xt} based on the noisy measurement process {yt}. The (steady-state) Q KF is 
described as (See Fig. Q] (a) for the block diagram) 

{x t +i = ax t + Le t 
f t — cx t (3) 
et = yt-cxt, 

where 

L '■= 1 I 2V (4) 

is the asymptotic Kalman filter gain, and E is the asymptotic error covariance for x* (i.e. E = lim^oo E(xj — 
%t)ipt — Xt)'), which is the positive solution to the discrete-time algebraic Riccati equation (DARE) 

1 + C z 2u 

Solving the DARE, we obtain 

„ a 2 -I . a 2 -l 



L = . (6) 

ac 



7 Though {yt} is neither stationary nor even asymptotically stationary, a time- varying or time-invariant (steady-state) KF 
can be built to guarantee bounded error covariance for estimating x t , and the difference between the time- varying one and 
time-invariant one vanishes as time increases, as pointed out in Chapter 14 of [23]. 
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Fig. 1. (a) A KF problem, (b) A KF-based coding structure. 



2) KF-based Feedback Communication: Next, as illustrated in Fig. [T](b), we introduce a feedback com- 
munication coding scheme over an AWGN channel by slightly changing the KF problem shown in Fig. Q] (a). 
Rather than closing the loop after the AWGN N t (i.e. adding (—ft) to y t ), in Fig.[T](b), the loop is closed before 
the AWGN N t (i.e. adding (—ft) to r t ). This does not change anything but the signals between the two adders. 
As indicated in Fig. Q] (b), one can identify the encoder, the AWGN channel, and the decoder, described in the 
following for time t = 0, 1, • • •. 

AWGN channel: y t =ut + Nt, (7) 

where ut is the channel input, Nt 1 '~ ' A/"(0, 1) is the channel noise, and y t is the channel output. At time t, the 
encoder can access f t (generated from y t_1 ) via the noiseless feedback link: 

f x t +i = ax t 

encoder dynamics: < r t = cx t (8) 




where a and c are encoder design parameters. The encoding procedure is: Fix a set of Mt equally likely messages, 
then equally partition the interval [— |, |] into My sub-intervals, and map the sub-interval centers to the set of My 
messages; this is known to both the transmitter and receiver a priori. To transmit, let xq := W, the sub-interval 
center representing the to-be-transmitted message. In other words, the initial condition (at time 0) of the transmitter 
is the to-be-transmitted message. 

!x t+ i = ax t + Ly t 
f t = cx t (9) 
x ,t = a" t_1 Xi + i, 

and the decoding procedure is to simply map xq,t into the closest sub-interval center. (Note that in Fig. Q] (b), 
Vt = e t .) 

The objective of the feedback communication problem is to, under an average channel input power constraint 

' E||m t || 2 < V or lim — -E||u T || 2 < V (10) 



r + i 11 - t^oo t + i 

with V > being the power budget, achieve 



C fb (V)=C nf (V) = ±log(l + V), (11) 
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where Cfb(V) is the feedback capacity and C n f(V) is the non-feedback capacity in either the finite horizon (time 
to T) or infinite horizon (time to oo). To attain this objective, one can fixed any coding length (T + 1) and 
any e > (where e is an arbitrarily small slack from the capacity C//,). Then let a := s/T+^P, c ^ be arbitrary, 
Mt ■= a^ T+1 ^ 1_e \ and follow the above-described encoding/decoding dynamics/procedures. It can be shown that 
this communication scheme can transmit any message out of totally Mj messages with vanishing probability of 
error as T — > oo while satisfying the power constraint ( fTOb . Instead of proving the optimality directly, we may 
alternatively show that the coding scheme in Fig. |TJb) is a simple reformulation of the well-known SK coding 
scheme that has been shown to achieves the feedback capacity of the AWGN channel. To this aim, a slight variation 
of the original SK scheme proposed in [2] is illustrated in Fig. |2]@. In this figure, one can identify the encoder, 
AWGN channel, decoder, and the feedback link with one-step delay. 
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Fig. 2. The SK coding scheme. 



To see the connection between the two coding schemes, note that in the SK scheme, it holds that 

u t = ga t {xo^t-i - x ) 
xo.t = n— XQ,t-i - a ' "gy t 



2 + l- _«_ a <!2> 



and in the KF-based scheme, it holds that 

u t = ca^xo - &o,t-i) , 13 . 
xo,t = £o,t-i + a~ t ~ 1 Ly t . 

If we define 

g := \Ja 2 - 1, c := -g, (14) 

then both schemes generate identical channel inputs, outputs, and decoder estimates respectively, and hence they 
are considered as equivalent. The optimal choice of g in the SK coding scheme indeed corresponds to the (optimal) 
KF gain. Thus, we conclude that the SK scheme essentially implements the KF algorithm. In fact, more insights 
can be obtained from this AWGN example; see Chapter 3 of [22]. These insights can be extended to the case of 
Gaussian channels with memory, which we now turn to. 

III. Channel model 

In this section, we briefly describe two Gaussian channel models, namely the colored Gaussian noise channel 
without ISI and white Gaussian noise channel with ISI. 



8 A few SK-type schemes and their variations are compared in [21], The variation here performs the same operations every 
step, as opposed to the scheme in [2] in which the initialization step differs from later steps. See also [14], [34] 
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A. Colored Gaussian noise channel without ISI 

Fig. [3(a) shows a colored Gaussian noise channel without ISI. At time t, this discrete-time channel is described 



as 



y t =u t + Z t , fort = (),!,• 



(15) 



where Ut is the channel input, Z t is the channel noise, and y t is the channel output. We make the following 
assumptions: The colored noise {Z t } is the output of a finite-dimensional stable and minimum-phase linear time- 
invariant (LTI) system Z(z), driven by a white Gaussian process {N t } with zero mean and unit variance, and 
Z(z) is at initial rest. We assume that the LTI system Z(z) has order (or dimension) m and Z(oo) ^ (i.e. 
Z(z) is proper but non-strictly proper). We further assume, without loss of generality, that Z(oo) — 1; for cases 
where g :— Z(oo) ^ 1, we can normalize Z(z) using a scaling factor 1/g. Then, the finite dimensionality of Z{z) 
implies that Z(z) admits the following transfer function representation 



Z{z) 



Z m + f m -lZ m - 1 + --- + flZ + f 



where {/ ,< 



z m + {f m -i + gm-i)z m - 1 + ••• + (/! + gi)z + (/o + .go) ' 
i fm-l} an d {.9oi ' ' ' 1 5m-i} are sucn that Z(z) is stable and minimum phase. Define 
z m + f m -xz m ~ x + ■■■ + . hz + f 



(16) 



Z z (z) 
Z p (z) 



Z™ + (fm-l + gm-^Z™- 1 + ••• + (/!+ 9l)z + (f + 9o ) 



Then it holds that 



Z{z) 



Z z {z) 



(17) 



(18) 



z p { z y 

that is, Z p (z) and Z z (z) contain the information about the poles and zeros of Z(z), respectively. For future 
reference, we define 

Q-z '■= [fm-l, ■■■ ,fo]' 

G p := [fm-i + ffm-i, ■ • ■ , fa + go}', 

that is, Q[ z and G[ p are the output matrices (vectors) for systems Z z (z) and Z p (z) (see Appendix II- Al for relevant 
state-space representation concepts). 
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Fig. 3. (a) A colored Gaussian noise channel without ISI. (b) The induced ISI channel with AWGN. (c) State-space realization 
of channel T . 

We can also represent the input-output relation based on time-domain operators {matrices). For any block size 
(i.e. coding length) of (T + 1), we may equivalently generate Z T by 



Z T = Z T N T , 



(20) 
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where Zt is a (T + 1) x (T + 1) lower triangular Toeplitz matrix of the impulse response of Z(z). Since we 
have assumed that Z(oo) = 1, the diagonal elements of Zt are all 1. Likewise, Z z .t and Z p .t, the matrix 
representations of Z z (z) and Z p (z), are respectively given by 



'z,T ■= 



'P,T 



1 

fm — 1 
fm-2 

fo 




h 

fo 






1 

/2 

/l 



/m-1 1 



1 

/m-1 + <?m-l 1 
/m-2 + 9m-2 /m-1 + .9m-l 



/o + 9o 






/i +9i h+ 92 
fo + go fi + .9i 











(21) 



,/m— 1 ,9m — 1 

that is, Zp.x and -Z Z) t are lower triangular, Toeplitz, and banded with bandwidth (m+1), corresponding to causal, 
LTI, mth order moving-average (MA-m) filters. Therefore, it holds that 



Zt — Z pT Z z ,T — Z 



(22) 



As a consequence of the above assumptions, {Z t } is asymptotically stationary. Note that there is no loss of 
generality in assuming that Z(z) is stable and minimum-phase (cf. Chapter 11, [35]). 



B. White Gaussian channel with ISI 

The above colored Gaussian channel induces a white Gaussian channel with ISI. More precisely, notice that 
from $15[ and d20l >, we have 

f =Z T (Z T 1 u T + N T ), (23) 

which we identify as a stable and minimum-phase ISI channel with AWGN {N t }, see Fig. [3] (b); Z^ 1 is well 
defined since Zt is lower triangular with diagonal elements being Z(oo) = 1. Here Z^ 1 (z) is also at initial rest. 
Note that Z^, 1 is the matrix inverse of Zt, equal to the lower-triangular Toeplitz matrix of impulse response of 
Z~ 1 (z). For any fixed u T and N T , ( TT3T > and (1231 generate the same channel output y T .% 

The initial rest assumption on Z _1 (z) can be imposed in practice as follows. First, before a transmission, drive 
the initial condition (which is enabled by the controllability requirement stated below) of the ISI channel to any 
desired value that is also known to the receiver a priori. Then, after the transmission, remove the response due to 
that initial condition at the receiver. Such an assumption is also used in [11], [12]. 

We can then write the minimal state-space representation of Z~ 1 (z) as (F,G_,l£. 1), where F S W a is stable, 
(F,G) is controllable, (F,H^) is observable, and m is the dimension or order of Z^ 1 (z). Let us denote the 
channel from u to y in Fig. [3fb) as T, where 



1 T 
U 



N 1 



Z T y . 



(24) 



9 More rigorously, the mappings from (u, N) to y are T-equivalent. For a discussion about systems representations and 
equivalence between different representations, see Appendix U 
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The channel T is described in state-space as 

channel T : 



st+1 
Vt 



Fs t + Gu t 
H's t + ut + N t , 



(25) 



where s = 0; see Fig. 0(c). Notice that channel T is not essentially different than the channel from u to y, since 
{y*} and {y*} causally determine each other. Without loss of generality, we can choose (F,G,H_', 1) to have the 
following observable canonical form: 





frn — 1 




F := 




Im-1 




-fx 






I -fo 


fi(m-l)xl - 



H' := [1 



G 



9m-l 



fjo 



(26) 



0]. 



In other words, it holds that Z~ 1 (z) = lfi{zl - F)" X G + 1. Note that we also have = G' + G' z (see (fT9t). 
We concentrate on the case m > 1; the case that to is (i.e., T is an AWGN channel) was solved in [1], [2]. 



IV. The feedback capacity in finite-horizon and the Cover-Pombra structure 
A. The CP structure for the colored Gaussian noise channel and finite-horizon capacity 

We briefly review the CP coding structure for the colored Gaussian noise channel specified in Section IIII-AI 



rp IT) 

(see [6], [36]). Denote the covariance matrix of the colored Gaussian noise Z_ as K y z>, and let 



u T := BtZ 1 



(27) 



is independent of Z T L^j. Now the channel output is 



where Bt is a (T + 1) x (T + 1) strictly lower triangular matrix, v T is Gaussian with covariance > and 



(28) 



y 



u T + Z T = (I+B T )Z T + V 1 



Then Ct, the finite-horizon capacity, is defined as the highest information rate that the CP structure can generate: 

1 



C, 



C T {V) 



sup 
sup 

sup 



T+ 1 
1 



log- 



detK 



(T) 



2(T + 1) 6 det ld T) 



(29) 



1 



det((I + B T )K ( P (I + Bt)' +K£>) 



2(T + 1 



■log 



dctK 



(T) 



(T) 

where the supremum is taken over all admissible K\j_ and Bt satisfying the power constraint 

Pt ■= 



' ti(B T KPB' T + K(p)<V. 



T+l 



(30) 



This finite-horizon capacity Ct is the operational capacity as given by Theorem 1 of [6] based on AEP and a 
random coding argument 0. Thus, we may focus only on the information rates in this paper and need not discuss 
coding in the operational sense. 

10 This v T is called innovations in [12], [36]; it should not be confused with the KF innovations in this paper. 
"One can also invoke Theorem 5.1 in [11] and the equivalence between directed information and mutual information in this 
case to claim that Ct is also the operation capacity. 
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To directly use the CP structure to construct a coding scheme is generally viewed as challenging for the following 
reasons, a) Its computation complexity grows faster than linearly with time (0((T + l) 2 ) unknowns to be solved 

(T) 

for each T), even though for each T the search of Ky_ and Bt can be posed as convex [19]. b) For each T the 
optimal Ky and Bt are not unique, (in fact there are an uncountable infinite number of optimizing solutions 
for each T, as can be easily seen from the T = 1 case); moreover, the optimal solution to coding length (T + 1) 
does not necessarily contain a part that is optimal to coding length T. Hence the search of optimal Kv_ and Bt 
for T is not likely to suggest what the optimal coding scheme could be for any other time horizon, c) In [6] the 
achievability of Ct is proven using a random coding argument, but a specific practical code has not been proposed 
or applied to the CP structure. Nevertheless, many insights can be obtained from the CP structure and it is also 
the starting point of our development. 

B. The CP structure for the ISI Gaussian channel 

In light of the correspondence relation between the colored Gaussian noise channel and the ISI channel T, we 
can derive the CP coding structure for T, which is obtained from d27i > by introducing a new quantity r T as 

r r :=(/+B T )"V. (31) 

By Z_ = ZtN_ and y = ZtV , we have 

u T = B T Z T N T + {I + B T )r T 

f = Z T \l + B T )Z T N T +Z T 1 (I+B T )r T (32) 
= Z t \I + Bt)(Z t N t + r T ). 

This implies that, the channel input u T can be represented as 

u T = {I + Bt)- 1 B t Z t £ + r T , (33) 

which leads to the block diagram in Fig. [4] Then the capacity Ct has the form: 

Ct(V) = sup 1 logdetJff) 

= sup 2(r : + - log dot (Z^(I + B T )(Z T Z' T +KP)(I+B T )'Z^') (34) 

= SUP 2(TTT) l0g A ^ ZtZ 't + K P ) 
where the supremum is over the power constraint 

Pt ■= ^—^tr(B T Z T Z' T B' T + (I +B T )KP(I +B t )') < V. (35) 



The capacity in this form is equivalent to ( 1291 . Another form of the capacity based on the directed information, 
namely an input/output characterization, can be shown as equivalent to the above form; see Appendix HI] One 
can also define the inverse function of Ct(V) as Pt{TV), which is equal to the infimum power subject to a rate 
constraint 

1 (36) 



2(t + i) & y. - 



V. Necessity of KF for optimal coding 

In this section, we consider a finite-horizon feedback coding structure over channel T denoted S := S(JF), 
which is a variation of the CP structure. This variation is useful since: 1) searching over all possible parameters 
in the structure achieves Ct, that is, there is no loss of generality or optimality when focusing on this structure 
only; 2) we can show that to ensure power efficiency (to be explained), structure § necessarily implements the KF 
algorithm. This implies that our KF characterization leads to a refinement to the CP structure. 
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channel T 



(7 + Br)" 1 



(/ + B r ) _1 B T 2 T 



Fig. 4. The block diagram of the CP structure for ISI Gaussian channel T. 



A. Coding structure § 

Fig. illustrates the coding structure S, including the encoder and the feedback generator, which is a portion of 
the decoder. (How the decoder produces the estimate of the decoded message will be considered shortly.) Below, 
we fix the time horizon to span from time to time T and describe S. 



encoder 



channel T 



c! 



feedback 
generator 



E! 








Vt , 


Gt 


— < 




h 





Fig. 5. Coding structure § for channel T. 



Encoder: The encoder follows the dynamics 



Encoder: < 



n = a% (37) 

^ u t = r t - f t . 

where Xq := W_ ~ jV(0, / n +i). We assume that the encoder dimension (n + 1) is a fixed integer satisfying 
< n < T; A e m(«+ 1 ) x (»+ 1 ); C E R n+1 ; and the assumption (Al) holds: 

(Al): (A,C') is observable. 

We then let 

T T (A,C) := T T := [C, A'C, ■ ■ ■ , A T, C]' e R(r+i)x(n+i) 

KP(A,C) := KP := Er T r T ' " 6 r(t+i)x(t+i). (38) 

Therefore, F„ is the observability matrix for (A, C_') and is invertible, Ft has rank (n + 1), r T = T T W, and 
K ( p = Y T T' T with rank (n + 1). 

Feedback generator: The feedback signal (—ft) is generated through a feedback generator Gt, i.e. 

- f = Gt/, (39) 

where Gt & ]R(' r + 1 ) x ( T + 1 ) is a strictly lower triangular matrix, namely the output feedback is strictly causal. 

Throughout the paper, the above assumptions on the encoder/decoder are always assumed if not otherwise 
specified. For future use, we compute the channel output as 

V T = {I-Z T 1 G T r 1 {Z T 1 r T + N T ). (40) 
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Definition 1. Consider the coding structure § shown in Fig. \5\ Define the constraint capacity 
C T ,n ■= C T ,n(V) := sup -!—l(W;y T ) 

J 4eR("+ 1 >x (»+•), C,0 T , (-41) i + 1 (41) 
s.t. E||« r || 2 /(T+l)<P 

and define its inverse function as Pr t n (7^)> that is, 

p t ,u-=ptM ■■= inf T^mifw 2 

A£K(»+i)x(»+i),c,eT,(Al) T+ 1 (42) 
s.t. /(VV;y r )/(T+l)>TC 

In other words, Cr, n is the finite-horizon information capacity for a fixed encoder dimension (n+1), by searching 
over all admissible .A, C_, and <?t of appropriate dimensions. The pair (V, Ct,ti('P)) an d the pair (Px,n (^) > ^) 
specify the optimal tradeoff between the channel input power and information rate for the communication problem 
with fixed encoder dimension. 

B. Relation between the CP structure and the proposed structure § 

The coding structure § over T in Fig. [5] was motivated and is tightly associated with the CP structure over the 
ISI Gaussian channel T in Fig. [4] Let u T [KP ,Bt) and u T (A, C_, Gt) denote the input sequences generated by 
encoders with (KP,B T ) and (A £ R( r + 1 ) x ( r + 1 ) j C,g r ), respectively. 

(T\ (T\ 

Lemma 1. i) For any given pair {K)_ ,Bt) with K)_ > 0, there exists an admissible triple 
(A E R( t+ V x ( t+1 \C,Gt) such that u T {K [ p,B T ) = u T (A,C,Gt); for any given pair (KP ,B T ) with 
KP > but KP ^ 0, there exists a sequence of admissible triples {(Ai € r( t + 1 ) x ( t + 1 ) ) C i; ^T,i)}£i such 
that u T (K { p,B T ) = lim^oo/fA^C^r.i); 

ii) For any given triple (A G Ij(' r + 1 ) x (' r + 1 ) ) Q_, Gt), there is an admissible pair (iff ,Bt) such that 

u t (kP,B t ) = u t (A,C,Gt); 
Hi) 

C T (V) = C t ,t(V), P T {n) = P t ,t{K). (43) 

Proof: See Appendix [HI] □ 
One advantage of considering the structure § is that we can have the flexibility of allowing T > n, which makes 

it possible to increase the horizon length to infinity without increasing the dimension of A, a useful step towards 

the KF characterization of the feedback communication problem. 

In what follows, several refinements to the coding structure § will be presented. 

C. The presence of the KF 

We first compute the mutual information in the aforementioned coding structure 8. 

Proposition 1. Consider the structure § in Fig. \5\ Let < n <T, (A,C) be observable with A e R("+ 1 ) x (™+ 1 ) 
and Gt be strictly lower triangular. Then 
i) It holds that 

I(W;f) = I(r T ;f) 

= i(u T ^y T ) 

= -logdet-K"^ 

2 B £ (44) 

= ^logdet(I + Z T 1 KpZ T : 1 ') 
= ilogdet(7 + Z T 1 r T r^Z T 1 '); 
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ii) I{W_; y T ) i s independent of the feedback generator Qt- 
Proof: i) 



I(W;y T ) = h{f)-h{f\W) 

h{f) -h((I- Z T 1 G T )-\Z^ 1 r T + N T )\W) 

- logdet(27reM T) ) - h{N T ) 
2 



(a) 



(b) T( T T\ (45) 

= I(u -^y) 

= ilogdettff) 

- ilogdet^ + ^^f)^ 1 '), 

where (a) is due to r T = T T W, det(AB) = detAdetB, and det(J - Z^GtY 1 = 1; and ( b ) follows from 
[14] or a direct computation of /(m t — * j/ T ). ii) It is clear from i) that I(W_;y T ) is independent of the feedback 
generator Gt, and depends only on Kr_ , or equivalently on (A,Cj. □ 

Remark 1. Though simple, Proposition Q] has interesting interpretations and implications. The first equality of i) 
shows that the mutual information between the message W and channel output y T is completely preserved in the 
mutual information between the message-carrying signal r T and channel output y T . The second equality shows that 
the directed information (cf. [11] and Appendix Hill in this setup is equivalent to the message-output characterization 
based on the mutual information, which is convenient in many situations. The third equality involves the output 
covariance matrix, a link towards the Bode waterbed effect and the fundamental concept of the KF innovations 
(to be explored in subsequent sections). The rest of the proposition implies that, for the given channel Z^ 1 fixed 
(A, C_) leads to a fixed information rate regardless of the feedback generator. In fact, the mutual information may 
be interpreted as anti-causal and independent of the the strictly causal feedback generator. Hence the feedback 
generator Qt has to be chosen to minimize the average channel input power in order to achieve the capacity 
(recalling that the capacity problem can be expressed as minimizing power while fixing the rate d42l >). which 
necessitates a KF. Note that the infinite-horizon counterpart of this proposition was proven in [14]. 

Next we solve the optimal feedback generator for a fixed (A,C_), which is essentially a KF. Denote the optimal 
feedback generator for a given (A 7 C_) as Q T {A,C_), namely 

G T (A,C) := arginf-l-E||« T (AC,S T )|| 2 . (46) 

5t -I T 1 

By Proposition [T] we can define, for a fixed (A, C_), the information rate across the channel to be 

I(W;y T ) 

Rt(A,C) := . (47) 

Proposition 2. Consider coding structure S in Fig. \5\ Fix any < n < T. Then ( recall the capacity definition 
P T , n (K) in (E3) 

« 1 

= A ^l^c_ ^||^(A, fi e*(A£))|| 2 . (48) 

s.t. R T (A,C)>1Z 

ii) The optimal feedback generator G* T (A, C_) is given by 

G T (A,C) = -G* T {A,C)(I - Z^Gt^C))- 1 , (49) 

where is the one-step prediction MMSE estimator (Kalman filter) of given the noisy observation 

y T := 2yV T + iV T (i.e. the optimal one-step prediction is f T = ^^(^4, C_)y T ), given by 

G* T (A,C) := argmm-^-E(r T -^ T )(r T -^ T y T )', (50) 
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where Qt is strictly lower triangular. 

Fig. [6] (a) shows the associated estimation problem, (b) the KF Gt(A,(?) for (a), and (c) the state-space 
representation of the optimal feedback generator Q^(A,C) (see ( f6Tb and (l65l l for L_ x t and L 2 ()• 



unknown source 

I w I 



channel J 7 



estimator 



F <-> 



Nt 

■ ([) 



Kalman fitter C^- 



(a) 



(F,L2j,O!,0) St 



(b) 



(c) 



Fig. 6. (a) An estimation problem over channel T. (b) The KF Gt(A,(T) for (a), (c) The KF-based feedback generator 
Gt(A,(T) in state space. (A,L l t , —C,0) with x t denotes a state-space representation with x t being its state at time t, and 
initial condition x_ Q being 0. 



Remark 2. Proposition [2] reveals that, the minimization of channel input power in a feedback communication 
problem is equivalent to the minimization of MSE in an estimation problem. This equivalence yields a complete 
characterization (in terms of the KF algorithm) of the optimal feedback generator Q^{A, (7) for any given {A, (7), 
as shown in Section IVI-BI This proposition refines the CP structure as it shows that the CP structure necessarily 
contains a KF. 

Remark 3. Proposition [2] i) implies that we may reformulate the problem of Ct,u (or Pt,u) as a two-step 
problem: STEP 1: Fix (A,C_) (and hence fix the rate), and minimize the input power by searching over all 
possible feedback generator Q for the fixed (A,C_); STEP 2: Search over all possible (A,C_) subject to the rate 
constraint of Rt(A,C_) > 1Z. Thus, one essential role of the feedback generator Q for any fixed (A,C) is to 
minimize the input power, which can be solved by considering the equivalent optimal estimation problem in Fig. [6] 
(a) whose solution is the KF. It also follows that E(u t |y t_1 ) = 0, which implies no power waste due to a non-zero 
mean (cf. [12], Eq. (126)) and a center-of-gravity encoding rule (cf. [37]). The input generated by the KF-based 
feedback generator has the form 

u t = r t -E(r t \y t ~ 1 ), (51) 

which is related to the optimal input distributions obtained by e.g. [12], [16], [38], [39]. 

We also remark that the necessity of the KF in the optimal coding scheme is not surprising, given various 
indications of the essential role of KF (or minimum mean squared-error estimators or MMSE estimators; or cheap 
control, its control theory equivalence; or the sum-product algorithm, its generalization) in optimal communication 
designs. See e.g. [12], [14], [33], [40]-[42]. The study of the KF in the feedback communication problem along the 
line of [42] may shed important insights on optimal communication problems and is under current investigation. 
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Proof: i) Notice that for any fixed (A,C), Rt(A,C_) is fixed. Then from the definition of PT,n(ft)> we have 

A,C,y T J- + -I 

s.t. R T (A,C)>1Z 

1 (52) 

a?L, fi in I ^-tE||m t (A c,g T )f. 

s.t. B, T (A,C_)>1Z 

Then i) follows from the definition of Gt(A,C_)- 
ii) Note that for the coding structure S, it holds that 

u T = r T + (-£ T ) = r T + g T y T - (53) 

Then, letting 

Gt ■= Gt(I - Z t x GtY x (54) 
and y T :— £ T + N T , we have GtV T — —GtV T - Therefore, 

G* T (A,Q = M g M-^—E( L T + G T y T )(r T + G T y T )' 

GtT ^ 1 ^~ ^~ (55) 

= arginf — — E(r T - G T f)(r T - Grf)' ■ 
g T 1 + 1 — 

Ilk' last equality implies that the optimal solution Gt i s the strictly causal MMSE estimator (with one-step 
prediction) of r T given y~ T ; notice that Gt is strictly lower triangular. It is well known that such an estimator 
can be implemented recursively in state-space as a KF (cf. [23], [43]). Finally, from the relation between Gt and 
Gt, we obtain d49| i. The state-space representation of Gt{A,C_), as illustrated in Fig. 0(c), can be obtained from 
straightforward computation, as shown in Appendix II- Al □ 
We remark that it is possible to derive a dynamic programming based solution ( [11]) to compute Ct,u, and 
if we further employ the Markov property in [12] and the above KF-based characterization, we would reach a 
solution with complexity 0(T) for computing Ct.h and Gt- However, we do not pursue along this line in this 
paper as it is beyond the main scope of this paper. 



VI. Connections among feedback communication, estimation, and feedback control 

We have shown that in the coding structure S, to ensure power efficiency for a fixed (A,C_), one needs to design 
a KF-based feedback generator. The KF immediately links the feedback communication problem to estimation and 
control problems. In this section, we present a unified representation of the optimal coding structure §* (i.e., S* is 
§ but with G being chosen as G*{A,C_)), its estimation theory counterpart, and its control theory counterpart. Then 
in the next section we will establish relation among the information theory quantities, estimation theory quantities, 
and control theory quantities. 



A. Unified representation of feedback coding system, KF, and cheap control 
Coding structure §* 
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The optimal feedback generator for a given (A,C_) is solved in d49l ), see Fig.|6](c) for its structure. We can then 
obtain a state-space representation of the optimal feedback generator Q^.(A,C), and describe the coding structure 
S* which contains Gt(A,CT) as 



coding structure §* < 



±4+1 

n 

§4+1 

yt 



±4+1 



Ax t 

n - f t 



Gu t 



= H's 



-r t = 



.(T«t + 

Fit + L.2,t e t 
yt-K'i t 

Ax t + L lt e t 

-c!x t 



Nt 



encoder 



channel T 



(56) 



optimal feedback generator G*{A, C_) 



with x a = W_ unknown, sq = s = 0, and x Q = 0. Here L x t 6 and L 2 t £ W n are the time-varying KF 

gains specified in d64l ). See Appendix Q] for the derivation of a state-space representation of Gt(A,C_)- 
The estimation system 

The estimation system in Fig. [6] (a) and (b) consists of three parts: the unknown source r T to be estimated or 
tracked, the channel T (without output feedback), and the estimator, which we choose as the KF G*\ we assume 
that (A, (7) is fixed and known to the estimator and hence the randomness in r T comes from the initial condition 
of r T . The system is described in state-space as 



estimation system: < 



±4+1 

rt 



±4+1 



Ax t 

C'x t 



n 



24+1 



= H's, 



a% 



-n+Nt 



unknown source 



channel T 



Fs t +Gf t + L 2t e t 



(57) 



> Kalman filter Q* (A, C) 



with x = W_, s Q = s Q — 0, and x Q = 0. To write this in a more compact form, define 



3-t 



GC 
C 

H 

Li,t 

L.2,t 



(58) 



Then we have 



estimation system: < 



AX t 



y t -^ 



L t e t 



| unknown source and channel T 
| Kalman filter G*(A,C) 



(59) 
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with X = [W',0']' and X = 0. 

It can be easily shown that rt, ft, et, x t , and £ t in d57l > and 
any t, 

S.t ~ §t = —t _ — t ! 



are equal, respectively, and it holds that for 

(60) 



which leads to the following unified representation as a control system. 
The unified representation: A cheap control problem 
Define 



St St ~ St St 



w 



c 



(61) 



Note that X ( is the estimation error for X t . Substituting (ISTl l to d57l ) and d56l l. we obtain that both systems become 



control system: 




I' It 



- L t C')X t - L t N t = AX t - L t e t (state evolution) 
X t + JVt (noisy measurement) 

(regulated output) 



(62) 



See Fig. [7] for the block diagrams. It is a control system where we want to minimize the power of the regulated 
output u by appropriately choosing L t . More specifically, one may view e t as the noisy measurement which is also 
the input to the controller, (—L t ) as the time- varying controller gain, {—L t et) as the controller's output which is 
also the input to the system with state X t . The objective is to minimize E||m t || 2 ; more formally we want to solve 

^Ell^H 2 

(63) 



mm — — 

L n ,-,L f T 



■ °t. JS3 



i 



in which A,C_ are given and W_ unknown. Note that the control effort is "free" as there is no direct penalty on the 
controller's output (— L t et). This is a cheap control problem, which is useful for us to characterize the steady-state 
solution and it is equivalent to the KF problem (see [44]; also see [21] for the discussion of cheap control and the 
closely related expensive control and minimum-energy control). 



IT 



ut 



a 



(A -L U ,Q!,Q) 



-Li,t 



H' 



(a) 



^4 



ut 



et 



-Lt 



(b) 



Fig. 7. Two equivalent block diagrams for the cheap control system. In (a) the block (A, —L_ x t ,C_,0) denotes the state-space 
representation with x t and W_ being its states at time t and at time 0, respectively. 
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The signal et in d62l is the KF innovation or simply innovation^. One fact is that {e t } is a white process, that 
is, its covariance matrix is a diagonal matrix. Another fact is that e T and y T determine each other causally, 



and we can easily verify that h(e T ) — h(y T ) and det K y — detKi ■ We remark that d62b is the innovations 
representation of the KF (cf. [23]). 

For each t, the optimal L t is determined as 



Li, 



AE t C 
K, 



(64) 



e,t 



where S f := EX t Xj, K £it := E(et) 2 = C'EjC + 1, and the error covariance matrix St satisfies the Riccati 
recursion 

AS t CC'S t A' 

E t+ i = AE t A - i ; (65) 



C'Et 



1 



with initial condition 



So 



(66) 



'/«+! 0" 

0. 

This completes the description of the optimal feedback generator for a given (A,C_). 

The existence of one unified expression for three different systems dSTb . (I56t . and d62b is because the first two 
are actually two different non-minimal realizations of the third. The input-output mappings from N T to e T in the 
three systems are T-equivalent (see Appendix II-Bb . Thus we say that the three problems, the optimal estimation 
problem, the optimal feedback generator problem, and the cheap control problem, are equivalent in the sense 
that, if any one of the problems is solved, then the other two are solved. Since the estimation problem and the 
control problem are well studied, the equivalence can sometimes facilitate our study of the communication problem. 
Particularly, the formulation d62b yields alternative expressions for the mutual information and average channel 
input power in the feedback communication problem, as we see in the next section. 

We further illustrate the relation of the estimation system and the communication system in Fig. [8] in which (b) 
is obtained from (a) by subtracting f t from the channel input and adding [Z^ ft) back to the channel output, which 
does not affect the input, state, and output of Q* T . It is clearly seen from the block diagram manipulations that 
the minimization of channel input power in feedback communication problem becomes the minimization of MSE 
in the estimation problem. This generalizes the observation we made regarding how to obtain a coding structure 
from a KF over an AWGN channel (as shown in Fig. [TJ to more general Guassian channels. 



B. Roles of the KF algorithm in feedback communication 

We have seen that the KF algorithm is necessary to ensure the power efficiency in feedback communication. 
Here we show that it is also needed to recover the transmitted signal x :— W . 

The estimation of x is a (an anti-causal) smoothing problem; more specifically, a fixed-point smoothing problem 
(cf. e.g. Ch. 10 of [23]), whose solution is typically easily obtained by studying the innovations process of the KF 
used for prediction. Note that X := [x , s ] := [W 1 , 0']', and hence the smoothed estimate for x can be obtained 
by the smoothed estimate of X (the constraint that s := should be automatically satisfied in the smoothing 
problem solution). Denote X | t := E(X Q |y*) and Xqu := E(Wjy*). The solution is given below. Denote the 
closed-loop state transition matrices as <&(£) := A c i(t — l)A c /(£ — 2) • • • A c ;(0) if t > and $(0) := J, where 
A cl (t) := A—L t C', and <t>(t) := A c i(t-l)A c i(t-2) ■ ■ -A cl (0) if t > and 0(0) := I, where A ci (t) :=A-L l t Cf. 
(It holds that <f>(t) is the upper left block of <&(£).) Then the smoothing equations are (see Problem 10.1 in [23]) 

Io,t = Xo,t-i + s o*'(*)CJif- t 1 e t 
12 The innovation defined here is consistent with the Kalman filtering literature but different from that defined in [6] or [12]. 
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unknown source channel T estimator 
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■ rt; 
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) i 2/4 > 
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Fig. 8. Relation between the estimation problem (a) and the communication problem (b). 

which are based on the KF innovations. 

The smoothed filter, in our special case of no process noise, can be alternatively obtained simply by invoking 
the invariance property of the MMSE estimation, if dctj4 ^ (as done in [14]). To see this, notice that x t+1 is 
the MMSE estimate of x t+1 with one-step prediction, i.e. x t+1 = E(a; t+1 ly*). Since x t+1 — A t+1 W_, it holds that 



(68) 



The last equality, which specifies a recursive way to generate the smoothed estimate, is again based on the KF 
innovations Similar equation holds for estimating X . A by-product of the above reasoning is the following 
identities valid when det A ^ 0: 

L t = A t+1 $'(i)C/if M 
L lt =A t+1 <l>'(t)C/K^. 

The estimation MSE error may be given by the following equations: 



(69) 



MMSEj£,i := E(W - E(W|y*))(W[ - E(Vy|y*))' 

= MMSEvr, { _i - <l>'(t)CK-}C'(l>(t) 

= MMSEw, t _i - KejA-^L^lJ^A- 

= a-^HmA-^', 



(70) 



where the last equality hold only if A is invertible, and S^t+i is the upper left (n + 1) x (n + 1) block of Et+i. 

Remark 4. We now have the complete characterization of the roles of KF algorithm in feedback communication. 
The KF of an unknown process driven by its initial condition and observed through a Gaussian channel with 
memory, when reformulated in an appropriate form, is optimal in transmitting information with feedback. The 
power efficiency (i.e. the minimization of the channel input power) in communication is guaranteed by the strictly 
causal one-step prediction operation in Kalman filtering (i.e. the operation to generate E(r t |i/ t_1 ) at time t); 
and the optimal recovery of the transmitted codeword ( optimal in the MMSE sense) is guaranteed by the 
anti-causal smoothing operation in Kalman filtering (i.e. the operation to generate ~E,(x f) \y t ~ 1 )). We may view 
this characterization as the optimality of KF in the sense of information transmission with feedback, which is a 
complement to the existing characterization that KF is optimal in the sense of information processing established 



13 However, numerical problems may arise if A contains stable eigenvalues for large t. 
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by Mitter and Newton in [42]. It is also interesting to note that, though for different classes of channels, different 
optimal coding schemes have been derived along different directions, these schemes can be universally interpreted 
in terms of KF of appropriate forms; see [22]. Thus, we consider that the KF acts as a "unifier" for feedback 
communication schemes over various channels. 

Finally, our study on the coding structure § also refines the CP structure. Indeed, we conclude that the CP 
structure needs to have a KF inside. We may further determine the optimal form of Bt- From ( 11201 ) and d49| >, we 
have that 

B* T = -G* T (A,C)Z T \ (71) 

where Q^{A,C) is the KF given in d57l >. Therefore, to achieve Ct in the CP structure, it is sufficient to search 
(K { J\B T ) in the form of 

B* T := -g* T (A,C)Z^\ 

VII. Connections of fundamental limitations 

In this section, we discuss the connections of fundamental limitations. These limitations involve the mutual 
information in the feedback communication system, the Fisher information, MMSE, and CRB in the estimation 
system, and the Bode sensitivity integral in the feedback control system. We show that one limitation may be 
expressed in terms of the others, as a consequence of the equivalence established above. 

A. Fisher information matrix (FIM), CRB, and Bode-type sensitivity integral (sum) 

Let us first recall the general definitions of MMSE, Fisher information matrix (FIM), and CRB: 

MMSEtv := ~E(W -W_)(W- W_)', (73) 

where W_ :— E(W]y) is the MMSE estimator of W_ based on noisy observation y; 

2 



d log pw,y{W,y) 
J-w '■= £< — 



E 



dW I 

d 2 logpw, y (W,yY 



(74) 



\ dW 2 J 
to be the (Bayesian) FIM, where pw,y(W, p) is the joint density of W and y; and 

CRB^:=T^ (75) 

to be the (Bayesian) CRB [24]. Note that it always holds, as a. fundamental limitation in estimation theory, that 

MSEw_ > CRBvv, (76) 

regardless of how one designs the estimator [24]. This inequality is referred to as the information inequality, 
Cramer-Rao inequality, or van Trees inequality 0. 

The Bode sensitivity integral is a. fundamental limitation in feedback control (typically in steady state). Simply 
put, for any feedback design, the sensitivity of the output to exogenous disturbance cannot be made small uniformly 
over all frequencies since the sensitivity transfer function's power spectrum in log scale sums up (integrates) to 
be constant. See Section HX-BI and [29]. A similar limitation holds in finite horizon as we now show. 

14 Some authors distinguish the Cramer-Rao inequality and van Trees inequality by restricting the former to be non-Bayesian 
and unbiased and the latter to be Bayesian and possibly biased. 
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As 

y T = (I- Z t x GtY\Z t \J + iV T ), (77) 

the sensitivity of channel output y T to noise N T is St := (/ — Qt) -1 - It is then easily seen that, if the 
spectrum of StS' t is {A^}^ 1 , then 

T+l 

£ log A, = 0, (78) 

i=l 

which holds valid regardless of the choice of feedback generator Qt, including the case that there is no feedback 
(i.e. open loop). Thus, the effect of noise N T cannot be made arbitrarily small in the measurements y 1 ', which 
may be viewed as a fundamental limitation of noise (or disturbance) suppression. 

Since the noise N_ is normalized, one may also define the sensitivity based on the spectrum of K y or on 
the innovation process variance K e t - Let 

T+l T 

BIt := g E lo S A ^i T) ) = E lo S^.*' < 79 ) 

i=l t=0 

which is easily seen independent of any causal feedback and is the finite-horizon counterpart of the widely known 
Bode sensitivity integral of infinite-horizon. 

B. Expressions for mutual information and channel input power 
We have the following proposition. 

Proposition 3. Consider the coding structure §. For any fixed < n < T and observable (A, C_ ) with A £ 
R (n+i)x(n+i) > it holds that 

i) 



= ^Elog(C'E t C + l) 



T 

■2 

t=o 

-logdetMMSE^ T (80) 

-logdetZV )T 

-logdetCRB^ r ; 



ii) 



1 T 



{=0 

' trace(PMMSE r , T ) (81) 



T+l 

T 



J2c!A t MMSEw it A t 'C, 



T , 

t=o 

where MMSE\y,T is the minimum MSE of W_ at time T, CMMSE r T is the causal minimum MSE of rf at time 
T, Iw,T I s the Bayesian Fisher information matrix of W_ at time T for the estimation system d57l ), and CRB w,T 
is the Bayesian CRB of W_ at time T. 

Note that PMMSE ri T := E(r T — r T )(r T — r T y, in which f T = [f , ■ ■ ■ , f T ] contains the (strictly causal) 
estimates with one-step prediction r t := C'E(a; t |j/ t_1 ) for t = 0, • • • ,T. 



23 



Remark 5. This proposition connects the mutual information to the Bode sensitivity integral of the associated 
control problem and to the innovations process, Fisher information, (minimum) MSE, and CRB of the associated 
estimation problem. Note that any mutual information larger than the value given above is not possible regardless 
of how one designs the feedback generator, and how much mutual information we may obtain is limited by the 
control problem fundamental limitations and by how well the estimation can be done and hence by the Fisher 
information, MMSE, and CRB. Thus the fundamental limitation in feedback communication is linked to the 
fundamental limitations in control and estimation. 

This proposition also shows that the spectrum of the output covariance matrix or the innovation variance cannot 
be made large or small uniformly, which may be viewed as the finite-horizon, time domain counterpart of the Bode 
sensitivity integral in the steady state and frequency domain. Notice that so far the estimation problem and control 
problem do not rely on asymptotic notions such as stability (stability was used to establish the Bode-Shannon 
connections between feedback communication and feedback stabilization in steady state [14]). 

As a side note, if one defines the complementary sensitivity as Tt ■— Z7p l Q T {I — Z^ Gt) -1 . it still holds that 
St — It — I, which resembles the fundamental algebraic tradeoff in the steady state and frequency domain (cf. 
[29]). 

Proof: i) First we simply notice that h{y T ) = h(e T ), and K e>t = C'£ i C+ 1. Next, to find MMSE of W, note 
that in Fig. (a) 

f ^Z^TtW + N 7, (82) 
and that W ~ Af(0,I), N T ~ W(0, J). Thus, by [43] we have 

MMSEw.t = (I + Y^Z^'Z^YtY 1 = I - T' T {Z T Z' T + T^Tt)' 1 ^, (83) 



yielding 



det MMSE^ = det (I + Z^ l T T T' T Z 



-lA-l 



T 



det(I + Z T 1 K ( pZ T 1 ')- 1 = det(K ( P + K ( p) 



(84) 



Besides, from Section 2.4 in [24] we can directly compute the FIM of W_ to be (I + V' T Z T 'Z T T T ), Then i) 
follows from Proposition Q] and (l62l . 

ii) Since u t = W X t = (7' ^ = r t - f t and Ex t x' t = ^MMSEv^", we have E(u t ) 2 = D'E t B = 
C'EijijC = E(r t - f t ) 2 , and then ii) follows. □ 



C. Connections of the fundamental tradeoffs 

The above fundamental limitations are based on one fixed (A,C_) with A E W ixn . Searching over all admissible 
(A, C_) with A 6 R™ x ™ for all n <T, one can obtain the optimal tradeoffs for feedback communication, estimation, 
and feedback control, as well as the corresponding relation among these tradeoffs. Note that the linear scheme 
with (A,C_) can attain the optimal tradeoffs as we have established in the feedback communication system (see 
Proposition [2]), and hence the optimal tradeoffs obtained by searching over all admissible (A,C_) are indeed the op- 
timal tradeoffs over all (possibly nonlinear, provided relevant quantities are well defined) feedback communication 
designs, estimator designs, and feedback control designs. These fundamental tradeoffs are elaborated below. 

The fundamental tradeoff in the feedback communication problem over the channel T for finite-horizon from 
time to time T is the capacity Ct^t^P) (or Pt,t(^-) 5 see Definition [TJ in the form of the optimal power-rate 
pair. (As indicated by Proposition [2] searching over all admissible (A, C_) achieves the capacity.) That is, we have: 
(Tl) Optimal Feedback Communication Tradeoff: Given the channel T with one-step delayed output feedback 
and an average channel input power V, the achievable information rate Rt{} \V) cannot be higher than a constant 
Gt.t^P) for any feedback communication design /; here Rt(J ,V} := xTj! (^ T (/) — * J/ T (/)) is the information 
rate with feedback design / such that yij-E||u T (/)|| 2 < V . 
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Alternatively 

(Tl') Optimal Feedback Communication Tradeoff: Given the channel T with one-step delayed output feedback 
and an information rate 1Z, the achieable average channel input power Pr(f,H) cannot be lower than a constant 
Pt,t(TZ) for any feedback communication design; here Pr(f, TV) '■= 7^piE||w T (/)|| 2 is the average channel input 
power with feedback design / such that jq^I (w T (/) — > y T (f)) > 1Z. 

Note that the average input power depends on the strictly causal feedback from the channel output; the 
information rate, however, is independent of the causal feedback, may be achieved by anti-causally processing 
the channel outputs y T , and hence can be used as a measure of anti-causality of the system. 

A fundamental tradeoff for the estimation problem over the channel T is the causal estimation performance 
versus anti-causal estimation performance. Assume a process r T is passed through the channel T and generates 
measurements y T . Let W := R _1 r T , where R := ^K^^j if is of full rank; otherwise R is such 

that Kw is of full rank with rank(Kw) =rank(.ffr ) and Ky/_ — I. That is, W_ may be viewed as the to- 
be-estimated, normalized signal that completely determines the process r T . Therefore we have a linear model 
y T = Z^HW + N T . Again one can define innovation as e t := y t — E(y t |y* _1 ) for each t. 
(T2) Optimal Estimation Tradeoff: Given the channel T and the time-averaged one-step prediction MMSE 

1 T 

PMMSE r := — — ]T (n - E(r t |y'- 1 )) , (85) 

t=o 

the decay rate of the anti-causal, smoothing MMSE 

2(T 1 +l) 1 ° gdCtMMSE ^ 1 = ~ 2(T + 1) lQg dGt E ~ E{m f^ (^- E (-^ll T ))' ( 86 ) 

cannot be larger than a constant, and the average of innovations variance in log scale 2 (t+i) St=o ^°E^e.t cannot 

be larger than a constant, for any one-step predictor design and smoother design. 

Alternatively 

(T2') Optimal Estimation Tradeoff: Given the channel T and the decay rate of the anti-causal, smoothing 
MMSE 2 (t+i) logdetMMSE^ 1 (or the average of innovations variance in log scale 2 (t+i) Ylt=o^°E^e,t), the 
time-averaged one-step prediction MMSE PMMSE,. cannot be smaller than a constant, for any one-step predictor 
design and smoother design. 

Note that the prediction MMSE depends on causality, while the smoothing MMSE is anti-causal and independent 
of the causal processing (if any) done by the estimator. That is, this tradeoff is concerned with prediction versus 
smoothing tradeoff, or more fundamentally, the causality versus anti-causality tradeoff. 

A fundamental tradeoff for the cheap control problem over the channel T is the control performance (regulated 
output variance, in this case the variance of the channel input signal) versus the Bode integral (or the disturbance 
rejection measure, degree of anti-causality, as defined in d79l>). View the channel input u t (f) as the regulated 
output with control design /, yt(f) be the associated channel output, and 

1 T+l 

BIt(/):= log Mtff )(/)). (87) 

i=i 

(T3) Optimal Feedback Control Tradeoff: Given the channel T and the average regulated output variance 
tTT Stlo E( M *(/)) 2 ' the Bode integral BIt(/) cannot be larger than a constant for any control design /. 
Alternatively 

(T3') Optimal Feedback Control Tradeoff: Given the channel T and the Bode integral, the average regulated 
output variance cannot be smaller than a constant for any control design /. 
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Note that this specifies the relation between the control performance achievable via causal feedback and the anti- 
causality of the system (that is, the Bode sensitivity integral or disturbance rejection measure which is independent 
of causal feedback). 

To summarize, we have seen that all three tradeoffs are essentially the fundamental tradeoff between causality 
and anti-causality, which manifests itself in the three different but closely related problems. The causal entities, 
e.g. the channel inputs in feedback communication, one-step prediction in estimation, and regulated output in 
control, are closed-loop entities generated in a causal, progressive way by the causal feedback, and hence vary as 
the causal feedback varies. On the other hand, the anti-causal entities, e.g. the information rate (and the decoded 
message) in communication, the smoothed estimate in estimation, and the BI in control, are invariant regardless 
of whether the systems are in open-loop or closed-loop or how the closed-loop is done. It is worth noting the 
various discussions involving causal versus anti-causal operations and filtering versus smoothing in the literature; 
see [24], [25] and therein references. 

In contrast, the power versus rate tradeoff in communication problems without output feedback cannot be 
interpreted as causality versus anti-causality tradeoff, nor can the tradeoff in the corresponding estimation problems. 
To see this, we again assume the linear Gaussian model y T = Z^f ~RW_ + N T . One can see that the channel 
input power is related to the unknown's prior covariance (i.e. covariance matrix of channel input RW-Q, whereas 
the mutual information I(W;y T ) = i log det MMSE^ 1 is related to the posterior covariance (cf. Theorem 10.3, 
[43]). Thus, in communication without output feedback, the power versus rate tradeoff may be translated into the 
tradeoff between the unknown's prior covariance and posterior covariance (or more generally the tradeoff between 
the unknown's prior and posterior distributions). Note it is easily verified that these two tradeoffs coincide in the 
AWGN channel case as one might expect. 

VIII. Necessary conditions for the optimality of the finite-horizon coding 

STRUCTURE § 

We discuss in this section a few useful properties of the coding structure §* with the optimal feedback generator. 
The first two properties, i.e., the orthogonality between future channel inputs and previous channel outputs, and the 
Gauss-Markov property of the transformed channel outputs, are direct consequences of the KF. Naturally, they can 
be viewed as necessary conditions for optimality of the feedback communication scheme as we have proven the 
necessity of the KF for optimality. The third property, the finite-dimensionality of the optimizing r T , yet another 
necessary condition for optimality, is a joint consequence of the KF structure and the waterfilling requirement for 
optimality for the finite-dimensional channel T. Finally, we show that the MMSE one-step predictor is necessary 
for achieving the feedback capacity of general additive channels with an average power constraint, followed by an 
extension of the orthogonality property over such channels. 



A. Necessary condition for optimality: Orthogonality condition 

First, we show that the coding structure S* satisfies a necessary condition for optimality discussed in [15] 
The condition says that, the channel input u t needs to be orthogonal to the past channel outputs y t_1 . This is 
intuitive since to ensure the fastest transmission, the transmitter should not (re-)transmit any information that the 
receiver has already obtained, thus the transmitter needs to remove any correlation with y t_1 in u t (to this aim, 
the transmitter has to access the channel outputs through feedback). This property, albeit a rather natural/simple 
consequence due to the Kalman filter, can yield interesting results, see e.g. [33]. 

15 This was later referred to as the orthogonality condition in [33], based on which a Kalman filter structure is identified. It 
was also discussed in [12], [45]. 
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Proposition 4. In system f |56| ), for any < r < t, it holds that Eit t e r = and Eit t y T = 0. Equivalently, matrices 
Eu T y T/ ant/ Eu T e T/ are upper triangular for any T. 

The justification of this proposition follows simply from the famous Projection Theorem (for MMSE estimators, 
the estimation error at a time is orthogonal to all available measurements, see e.g. [23], [43], [46]) which holds 
for the KF. Here note that u t is in fact the one-step prediction error (i.e. u t — C[x_ t where x t is the estimation 
error for x t with one-step prediction using the estimator ~E(x t \y t ~ 1 )). We also provide an alternative proof based 
on the state-space model in the appendix. 

Proof: See Appendix HVl □ 

B. Gauss-Markov property of the transformed output process 

In this subsection, we show that the process y T , a transformation of the output process y T or y T , is a Gauss- 
Markov (GM) process. In particular, it is an MA-m Gaussian process. This is a generalization of the result obtained 
in [9], which states that if the channel has an MA-m Gaussian noise process and has no ISI, a necessary condition 
for optimality is that the channel output needs to be an MA-m Gaussian process; see Corollary IV. 1 in [33] for the 
detailed statement and proof of the result of [9]. This result has been generalized in [33], that is, if the channel has 
an mth order autoregressive moving-average (ARMA-m) Gaussian process and has no ISI, a necessary condition 
for optimality is that the channel output needs to be an ARMA-m Gaussian process; see Proposition VII. 1 in [33]. 
Our result here, on the other hand, is concerned with an transformed output which is sometimes simpler to deal 
with. 

Recall the relevant definitions in (f2Tb and d22b of Section [TTlJ and define the transformed output process 

y T :=Z ZtT y T . (88) 



f = Z z . T (Zf, 1 u T + iV T ) = Z p , T u T + Z Z . T N T . (89) 



From 422]), it holds that 

f ^Z z . T (Z^ 1 u T + 

This implies that, y t + m +i is a linear combination of , and y t is a linear combination of ut_ m , since 

Z P: t is banded (and lower triangular) with bandwidth (m + 1). But the Projection Theorem yields that u*t™ +1 
is independent of y t , so y t +m+i is independent of y t . Repeat this argument and we can show that y T is a banded 
process, i.e., an MA-m process. More formally, we have 

Proposition 5. In system ( |56| ), it holds that the transformed output process y T is an MA-m Gaussian process, or 
equivalently 

K { p := Eff (90) 
is banded with bandwidth (2m + 1), i.e., = if \i — j\ > m + 1. 



Proof: See Appendix [TV] □ 
As a result of this proposition, we see that y T is an ARMA-m process, as claimed in [33]. 
The different forms of channel outputs, i.e. y T , y T , and y T , causally determine each other; see Fig. [9] for 
their relations. Fig. [9] (a) shows the ISI-free colored Gaussian noise channel with a direct channel output y T and 
a transformed output y T . Since this channel has no ISI, the optimal effective input process must waterfill the 
effective noise spectrum and hence y T is the waterfilling output for the optimal scheme. Fig. [9] (b) shows the 
ISI channel corrupted by AWGN, with a channel output y T . Since the channel noise is white, it may be easy to 
directly apply the KF algorithm. Fig. |9](c) is an ISI channel corrupted by a colored Gaussian noise with a channel 
output y T , but both the ISI filter and the filter generating the colored noise are MA-m filters. It may be easily 
used to establish that y T is an MA-m process. These formulations are T-equivalent and can be easily converted 
from one to another. 
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Fig. 9. (a) A colored Gaussian noise channel without ISI. This formulation may be directly used to study the waterfilling 
property of the optimal solution, (b) An equivalent ISI channel with AWGN. This formulation may be easily used to study 
the KF properties of the optimal solution, (c) Another equivalent channel model with both ISI and colored noise, but the ISI 
and colored noise filter are both MA-m filters. This formulation may be used to study the finite-dimensionality of the channel 
input/output processes. Note that Zt can be realized as (F — GH \ —G,H_', 1), Z^, 1 as (F,G,H_', 1), Z Pi t as {F z ,G p ,H_', 1), 
as (F + G z I?,G p ,H',l), and Z X , T as (F Z ,G Z ,H!,1) 



C. Finite dimensionality of the optimizing r 

We now show that, to achieve the finite-horizon feedback capacity Cr,n, the covariance matrix of the feedback- 
free, message-carrying process r T can have rank at most (m+ 1), where m is the order of the channel Z(z). This 
is an extension of the finite-rankness property by Ordentlich (c.f. [9], [33]) for a Gaussian channel with an MA-m 
noise process to a Gaussian channel with an ARMA-m noise process. 

Proposition 6. For system ( 1561 ), the optimal that solves Ct,u as defined in ([7} has rank at most (in +1). 

The proof of this proposition is based on Lemma [2] below. This lemma deals with a special class of 771th order 
channel Zt, that is, any Zt such that (/o + go) = (see Sec. HUfor notations). In other words, Z p .t is in fact 
an MA-(tt7, — 1) model. For this class of channels, it is easy to extend the idea of Ordentlich (c.f. [9]) and prove 

(T) 

that the optimal K L has rank at most to. Then the proposition can be proven by approaching any arbitrary Zt 
by elements in the special class of channels based on certain continuity properties. 

Lemma 2. For system ( 1561 ) with (/o + .go) = 0, the optimal that solves Ct,u as defined in (Q has rank at 

most to. 

See Appendix IIV-AI for the proofs of the lemma and the proposition. 

D. Necessity of the MMSE predictor for general channels with feedback 

The necessasity of the Kalman filter in achieving the optimality for the channel T under an average power 
constraint can be easily extended. Assume an arbitrary additive channel 

y T = llu T + Z T (91) 
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with an average power constraint E||u T || 2 < (T+1)V, where Z T is an arbitrary additive noise process. Assuming 
one-step delayed channel output feedback, then such a channel needs to contain an MMSE one-step predictor in 
order to achieve the feedback capacity. 

Proposition 7. Let u t := u t — E(u t |?/ t_1 ) and y T := Hu T + Z T . Then I(u T — > y T ) = /(w T — > y T ) and 
E/'/ > E5 T '£ T . 

Proof: Note that E(u f |y t_1 ) can be generated and added back to the channel output at the receiver side and 
hence the directed information across the channel or mutual information from the message to channel outputs is 
the same using either u T or w T as the channel inputs. The average power of using w T is no larger since it has 
minimum variance. □ 

Simple as it is, this necessary condition for optimality is rather universal. A corollary is that in the optimal 
feedback coding scheme the current channel input ut is independent of all past channel outputs y 1 ^ 1 by the 
Projection Theorem, an extension of Proposition |4] Moreover, since Eu T = by the law of total expectation, 
it is a center-of-gravity encoding rule (cf. [12], [37]). It is also straightforward to see that if the channel output 
feedback delay is d steps, then an MMSE <i-step predictor is needed for optimality. 

IX. Asymptotic analysis of the feedback system 

By far we have completed our analysis in finite-horizon. We have shown that the optimal design of encoder and 
decoder must contain a KF, and connected the feedback communication problem to an estimation problem and 
a control problem. Below, we briefly consider the steady-state communication problem, by studying the limiting 
behavior (T going to infinity) of the finite-horizon solution while fixing the encoder dimension to be (n + 1). The 
infinite-horizon capacity problem will not be considered in this paper. Here and hereafter, we make the following 
assumption unless otherwise specified: 

(A2): (A, Cf) is observable, and none of the eigenvalues of A are on the unit circle or at the locations of the 
eigenvalues of F. 



A. Convergence to steady-state 

The time-varying KF in d62l converges to a steady-state, namely d62l is stabilized in closed-loop: The distribu- 
tions of ut, et, and yt will converge to steady-state distributions, and S t , L t , Q^(A,C), Q%, and K et will converge 
to their steady-state values. That is, asymptotically d62l becomes an LTI system 



X t+1 = (A-L C')X t - LN t = AX 4 - Le t 
steady-state: \ e t = C' X ( + N t (92) 
= O'X,, 



where 



AEC 

L:=^=, (93) 



K e = C'SC + 1, and S is the unique stabilizing solution to the Riccati equation 

„ A ^ 4 , AEC C'SA' 

S = ASA j= . (94) 

C'SC + 1 v 

This LTI system is sometimes easy to analyze (e.g., it allows transfer function based study) and to implement. 
For instance, the cheap control (cf. [21] and [44]) of an LTI system claims that the transfer function from to e 
is an all-pass function in the form of 

T Ne (z) = f[^^ (95) 

z — A - 
i—O * 
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where Ao, • • • , A& are the unstable eigenvalues of A or A (noting that F is stable). Note that this is consistent with 
the whiteness of innovations process {et}. 

The existence of steady-state of the KF is proven in the following proposition. Notice that (l62l is a singular 
KF since it has no process noise; the convergence of such a problem was established in [47]. 

Proposition 8. Consider the Riccati recursion ( 1651 ) and the system ( 1621 ). Assume (A2) and that Ao, • • • , Afc are the 
unstable eigenvalues of A. 

ij Starting from the initial condition given in ( 1661 ), the Riccati recursion ( 1651 ) generates a sequence {Sf} that 
converges to Soo, the unique stabilizing solution to the Riccati equation \94\ , and Eoo has rank (fc + 1). 
ii) The time-varying system ( 162 1 ) converges to the unique steady-state as given in \92\ . 

Proof: See Appendix [V] □ 



B. Steady-state quantities 

Now we fix (A, (T) and let the horizon T in the coding structure §* go to infinity. Let H(e) be the entropy rate 

of {et}, 

k 

DZ(A):=JJ|Ai| (96) 

be the degree of instability or the degree of anti-causality of A, and £(e j27r9 ) := Y{eJ 2lTe )/N{e : > 2 '* 6 ) be the 
spectrum of the sensitivity function of system d92l (cf. [14]). Then the limiting result of Proposition[3]is summarized 
in the next proposition. 

Proposition 9. Consider the coding structure S*. For any n > and (A,C') with A e ]R(«+ 1 )x(™+ 1 ) satisfying 
(A2), 

i) The asymptotic information rate is given by 

RooA^C) ■■= lim T^—im y T ) 
= H(e) - Uog2ire 



log D I (A) 

log\S(e^ e )\dO 



ilog(C'SC + l) 

logdetl^.T 
urn r= — =F— 

T^oo 2(T+1) 

log det MSEw t 
-t 1 ™ 2(T+lf 
„ logdetCRBwT 

— lim ; . . 

T^oo 2(T+1) 



ii) The average channel input power is given by 



(97) 



PooJA^C) := lim -J— E||u T || 2 

oo,«V ,—> T^oo T+ 1 "- " (98) 

= D'SD. 
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Remark 6. Proposition [9] links the asymptotic information rate to the degree of anti-causality and Bode sensitivity 
integral ( [14]) for the control system, to the entropy rate and steady-state variance of the innovations process, 
asymptotic increasing rate of the Fisher information, and the asymptotic decay rate of smoothing MSE or of 
CRB for the estimation system. Note that the Bode sensitivity integral is the fundamental limitation of the 
disturbance rejection (control) problem, and the asymptotic decay rate of CRB is the fundamental limitation 
of the recursive estimation problem. Hence, the fundamental limitations in feedback communication, control, and 
estimation coincide. More specifically, the asymptotic information rate cannot be made higher or lower than a 
constant regardless of the feedback generator choice; the disturbance rejection measure cannot be made smaller 
than a constant regardless of the feedback controller design; the decay rate of the estimate error cannot be made 
faster than a constant regardless of the estimator design; and the constant is the logarithm of the degree of anti- 
causality of A. 

Remark 7. It is straightforward to extend the finite-horizon connections between the fundamental tradeoffs 
for feedback communication, estimation, and feedback control to infinite horizon. As the limits exist, quantities 
in fundamental tradeoffs (Tl) through (T3) given in Section IVII-CI are well defined in infinite horizon and the 
corresponding relationship still holds. Note it is more obvious to see that the Bode integral is associated with 
anti-causality since it equals the logarithm of the degree of anti-causality of A. 

Proof: Proposition [S] leads to that, the limits of the results in Proposition [3] are well defined. Then 



where the second equality is due to the Cesaro mean (i.e., if a k converges to a, then the average of the first k 
terms converges to a as k goes to infinity), and the last equality follows from the definition of entropy rate of a 
Gaussian process (cf. [36]). 

Now by d95l l, {e t } has a flat power spectrum with magnitude DI(A) 2 . Then R oo n (A,C_) — \ogDI(A). The 
Bode integral of sensitivity follows from [14]. The other equalities are the direct applications of the Cesaro mean 
to the results in Proposition [3] □ 

Proposition [9] implies that the presence of stable eigenvalues in A does not affect the rate (see also [14]). 
Stable eigenvalues do not affect P^ ^A, C_), either, since the initial condition response associated with the stable 
eigenvalues can be tracked with zero power (i.e. zero average MSE). Therefore, we conclude that the presence of 
stable eigenvalues in A does not affect either the rate R 00 . n (A,C_) or the power Poo,n(A,C_). We have thus seen 
that the communication problem is essentially a problem of tracking an anti-causal source over a communication 
channel ( [13], [14], [20]). 

Corollary 1. Suppose that (A,C_) with A G ]R("+ 1 ) X ("+ 1 ) satisfies (A2). Suppose further that A has (k + 1) 
unstable eigenvalues denoted Xq, ■ ■ ■ , Afe where < k < (n+l). Then there exists an observable pair (A k ,C_' k ) with 
A k € R( fc + 1 ) x ( fc + 1 ) being anti-stable such that R ocl ^ n (A, C) = Roo,k(Ak,C_ k ) and Poo, n {A,C) = Poo,k(A k ,C_ k ). 

Proof: See Appendix [VI] □ 



In this paper, we proposed a perspective that integrates information transmission (communication), information 
processing (estimation), and information utilization (control). We identified and explored fundamental limitations in 
feedback communication, estimation, and feedback control over Gaussian channels with memory. Specifically, we 




(99) 



W(e)--log27re, 



X. Conclusions and future work 
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established a certain equivalence of a feedback communication system, an estimation system, and a feedback control 
system. We demonstrated that a simple reformulation of the Kalman filter becomes the celebrated Schalkwijk- 
Kailath codes, and the well-studied Cover-Pombra structure necessarily contains a Kalman filter in order to be 
optimal. We characterized the roles of Kalman filtering in an optimal feedback communication system as to ensure 
power efficiency and to optimally recover the transmitted codewords. We showed that the fundamental limita- 
tions/tradeoffs in these three systems also coincide: The power versus rate tradeoff in feedback communication, 
the causal prediction versus smoothing tradeoff in estimation, and the control performance versus Bode integral 
tradeoff in control, are equivalent and in essence, all of them are the causality versus anti-causality tradeoffs. 
We also presented a coding scheme achieving the finite-horizon feedback capacity of the Gaussian channel. The 
scheme is based on the Kalman filtering algorithm, and provides refinements and extensions to the Cover-Pombra 
coding structuure and Schalkwijk-Kailath codes. 

Our new perspective has been recently generalized in [22] to uniformly address the fundamental limits of several 
classes of feedback communication problems, and we envision that this perspective can generate a new avenue 
for studying more general feedback communication problems, such as multiuser feedback communications. Our 
ongoing research includes extending our proposed scheme to address the optimality of more feedback communi- 
cation problems (such as single-user MIMO systems with output feedback, multi-user MIMO systems with output 
feedback). We also anticipate that the perspective and the approaches developed in this paper be extended and help 
to build a theoretically and practically sound paradigm that unifies information, estimation, and control. 



The concept of system representations and the equivalence between different representations are extensively 
used in this paper. In this subsection, we briefly introduce system representations and the equivalence. For more 
thorough treatment, see e.g. [48]-[50]. 

A. Systems representations 

Any discrete-time linear system can be represented as a linear mapping (or a linear operator) from its input 
space to output space; for example, we can describe a single-input single-output (SISO) linear system as 



for any t, where Ait € ]R(*+ 1 ) >< (*+ 1 ) is the matrix representation of the linear operator, u* G R* +1 is the stacked 
input vector consisting of inputs from time to time t, and y* € M* +1 is the stacked output vector consisting of 
outputs from time to time t. For a (strictly) causal SISO LTI system, M.t is a (strictly) lower triangular Toeplitz 
matrix formed by the coefficients of the impulse response. Such a system may also be described as the (reduced) 
transfer function, whose inverse z-transform is the impulse response; by a (reduced) transfer function we mean 
that its zeros are not at the same location of any pole. 

A causal SISO LTI system can be realized in state-space as 
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Systems representations and equivalence 



y l = Mtu 1 



(100) 




(101) 
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where x t € K z is the state, ut G K is the input, y t € M is the output, A is the state matrix, B is the input matrix 
(vector), C_ is the output matrix (vector), and D is the direct feedthrough term. We call I the dimension or the 
order of the realization. The state-space representation ( 110U may be conveniently denoted as (A,B_,C^, D). Note 
that in the study of input-output relations, it is sometimes convenient to assume that the system is relaxed or at 
initial rest (i.e. zero input leads to zero output), whereas in the study of state-space, we generally allow x ^ 0, 
which is not at initial rest. For multi-input multi-output (MIMO) systems, linear time-varying systems, etc., see 
[49], [50]. 

The state-space representation of an causal FDLTI system M. (z) is not unique. We call a realization (A, B_, C_' , D) 
minimal if (A,B_) is controllable and {A.Cf) is observable. All minimal realizations of A4(z) have the same 
dimension, which is the minimum dimension of all possible realizations. All other realizations are called non- 
minimal. The transfer function for the state-space representation (A,B_,Cf, D) is — A)~ 1 B_ + D. 

Example: Derivation of state-space representation of G*t{A ,C_) 

We demonstrate here how we can derive a realization of a system. Consider G"r{A,Q) m fiU m Section [Vl 
which is given by 

Qt{A,C) = -GW-Zt 1 G*t)~ X , (102) 

where the state-space representations for Q* T (A, C) and Z^ 1 are illustrated in Fig.[8](b) and Fig. 0(c). This result 
shows that the block diagram in Fig.|6](c) is indeed the dynamics of Q* T , as claimed in Proposition [2] iii). 

Since ( 1102b suggests a feedback connection of Q* and Z~ x as shown in Fig. [10] we can write the state-space 
for Q* as 

SLt+i = -^S-t + iii,t e t 



e t 



£.a,t+l 
It 



Fh + Gft 



y, 



H's t 



L.2,t e t 

ft 



> Kalman filter Q^p{A,CT) 



= Fs a ^ t + Gf t 
= vt+ ii's a , t + h 



(103) 



Letting s t 



2-t ia.t 



the above reduces to 



ft 

S.t+1 

e-t 



Ax t + L lt e t 
Fit+L. 2 . t e t 

yt -M!lf 



(104) 



This is the dynamics shown in Fig. [6] (c). Note that the above reduction of realization is allowed since it preserves 
the "T-equivalence", see the next subsection. 




Fig. 10. Q* is a feedback connection of Q* and Z 1 . 



Example: State-space representation of an inverse of a system 
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Given a linear system with input u and output y and represented as (A,B_, Cf, 1) in state space, it can be inverted 
if it is both stable and minimum-phase. The inverse system that maps y back to u can be realized in state space 
as (A-BC, -B,C',1). 

Example: State-space representations of Z z and Z p 

It is easily shown that Z z (z) can be realized as (F z , G Z ,H_', 1), Z p (z) can be realized as (F z ,G pl H^ , 1), and 
Zl x {z) can be realized as (F, —G_ z ,l£, 1), where F z :=F + G z tf. 

B. Equivalence between representations 

Definition 2. i) Two FDLTI systems represented in state-space are said to be equivalent if they admit a common 
transfer function (or a common transfer function matrix) and they are both stabilizable and detectable. 

ii) Fix < T < oo. Two linear mappings M ilT ■ K 9(T+1) -> R p(T+1) , i = 1,2, are said to be T-equivalent if 
for any u T G R 9(T+1) , it holds that 

Mi,t(u t ) = M 2 ,t(u T )- (105) 

We note that i) is defined for FDLTI systems, whereas ii) is for general linear systems, i) implies that, the 
realizations of a transfer function are not necessarily equivalent. However, if we focus on all realizations that do 
not "hide" any unstable modes, namely all the unstable modes are either controllable from the input or observable 
from the output, they are equivalent; the converse is also true, ii) concerns about the finite-horizon input-output 
relations only. Since the states are not specified in ii), it is not readily extended to infinite horizon: Any unstable 
modes "hidden" from the input and output will grow unboundedly regardless of input and output, which is unwanted. 
Example: T-equivalence between the estimation system (l57l and coding structure § d56t 
To show the T-equivalence, it is sufficient to show that for each t, the sets of signals r t , ft, e f , x t , and x t in ( f57b 
and d56l ) are equal, respectively. To this aim, first note that for t = 0, the sets signals are equal, respectively, and 
that s — s = s — s . Assume that for t < r, the sets of signals are equal, respectively, and that s T — s T — s T ~s T . 
Now use induction. Apparently, r T+ i and x T+1 generated by (IBTI i and ( f56l l are equal, respectively. Then 

s r+ i-s T+1 = F(s T - s T ) + G(r T - r r ) - L 2 ,T e r (1Q6) 

= irr+l ~ !t+1> 

and e T +i from both ( f57b and d56l l equals 

M!{s T - S T ) + (r T - f T ) + N T+1 . (107) 

Thus we have proven the T-equivalence. 

Likewise, we can show that the estimation system d571 l. feedback communication system (l56l l. and control system 
(|62] | are T-equivalent. 

Examples 

As we mentioned in Section IIII-BI for any u T and N T , Fig. [3] (a) and (b) generate the same channel output 
y . That is, the mappings from (u ,N_ ) to y for the two channels are identical, and both are given by 

y T = ZriZ^-yF + N T ). (108) 



Thus, we say the two channels are T-equivalent. 
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Appendix II 

Input/output characterization of finite-horizon information capacity: Directed 

information 

Definition 3. The directed information from u T to y T is defined as 

T 

I(u T ^y T ) ^^/(^Ij/*" 1 ). (109) 

t=o 

See [11] for details. One important feature about the directed information is that it is an input-output counterpart 
of the mutual information, which is especially useful to deal with channels within loops. 

Proposition 10. (Tatikonda and Mitter) It holds that 

C T (T) = sup -±—I(u T ->y T ), (110) 

u T 1 + 1 

where the supremum is over all possible feedback-dependent Gaussian input distributions satisfying the power 
constraint 

' Eu T 'u T <P, (111) 



T + l 
and in the form 

u t =y t u t - 1 +r/ t y t - 1 +Z t (112) 

for any ■J t € K', t] € R*, and zero-mean Gaussian random variable £ t € R independent of u t_1 and y t ~ 1 . 

This proposition follows directly from the following lemma. 

Lemma 3. The CP structure for the ISI Gaussian channel T shown in Fig. [4] can generate any Gaussian channel 
input process {«*} in the form of ( 17721 ) and vice versa. 

Proof: Note that any input generated by the scheme in Fig. |4]has the form of 

u* = BtZtN 1 +v*= BtZty* - Bd + v\ (113) 

leading to 

a* = {I + B t )- x B t Z tV 2 + (I + B t )-V, (114) 

where w* is independent of Nj and hence Z_ . 
On the other hand, from (II 121 . we have 

u* =jtu t +my t +^, (115) 

leading to 

«* = (/- 7*r w + (/ - itr 1 ?, (H6) 



where 74 6 K(*+ 1 ) x ( t + 1 ) j s the strictly lower triangular matrix formed by 7^, • • • , 7^, and fj t £ ]R(*+ 1 ) X (*+ 1 ) is the 
strictly lower triangular matrix formed by v' i " ' ■> v' t - Since for any r < t, £ r is independent of w T_1 and y T_1 , 
£ T is independent of Nj~ l . By causality £ T is independent of N T , N T+ i, ■ ■■. Therefore, is independent of JV* 
and hence Z*. Then the lemma follows by comparing ( II 14b and ( II 16l l. □ 
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Proof of Proposition [70} This proposition follows trivially from the observation that, for any Gaussian input of 
form (O, it holds that 



i(u T ^y T ) = Y.t {Kvt\^- l )-KvtU^- 1 )) 

= h(f)~h(N T ) 
= l(L T ;y T ). 



(117) 



□ 



Note that the above proof can be easily used to show that I(y? 



equals I(W;y T ) and I(£ T ;y T ), where 



W is the message. That is, the directed information from the input signal to output signal (both signals are inside the 
feedback loop and causally affecting each other) effectively captures the mutual information between the message 
W (or a message-carrying signal outside the feedback loop, such as £ T or r T ) and the output signal. Therefore, 
directed information has the advantage of capturing the capacity without the need to identify the message, see e.g. 
[12]. 



Appendix III 

Corresponding relation between the CP structure and coding structure S 



i) Assume kP > first. For any fixed {KY ' ,Bt) in the CP structure, define in § that 



(T) 



Gt 
A 

C 



(I + B T )- 1 B T Z T 



a 

r„[i 



o 



0]' :=r a e x , 



(118) 



where T a :— (Kr_ )a = is a positive definite square root, * can be any number, and A a and e x are defined 
in obvious ways (note that this vector e x should not be confused with the Kalman filter innovation e t ). Then 
it is easily verified that Qt is strictly lower triangular and (A, C') is observable One can compute that the 
observability matrix for (A,C_') is in fact T a , that is, 



(119) 



where the last equality is due to the structures of A a and e l . Thus, [A, C') can generate process r T with covariance 



- c; - 




- di - 


a a 






Q!A 2 




p' A 2 


_C!A T _ 




p' A T 



matrix Then by (|33] >. we know that for any given (K^ \ Bt) with K Y 1 > 0, we can find an admissible 



(T) 



(A,C,Qt) generating the same channel input u T as (K^\Bt) does. 

(T) (T) 

Now consider the case that > but is not positive definite. Consider the positive definite sequence 

{K ( P + Therefore, for each pair {K ( P + ll,B T ), we can find an admissible triple (Ai,C_ i: QT,i) 

corresponding to it per above construction. It is easily shown that the sequence of triples generate a sequence of 
inputs that converge to u t (K^\Bt)- Note that, however, power constraint or rate constraint given in Definition 
[TJ is not considered here and hence may not hold unless some further constraint on the sequence is imposed. 

<b A can also be chosen to be such that its eigenvalues are not on the the unit circle and not at the locations of F's eigenvalues, 
as Assumption (A2) requires in order to guarantee convergence in Section llXl 
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ii) Conversely, for any fixed admissible (A, Q_,Qt) with A S R( T+1 ) X ( T+1 ', we can obtain an admissible 
(K? \B T ) as 

B T := Q T Z T X (I -GtZ^ 1 )- 1 (120) 

K { p := T T (A,C)r T (A,cy, 

which generates identical channel input vf as (A,C_,Gt) does. 

iii) By the continuity of the mutual information and power, the limits of the power sequence and the mutual 
information sequence generated by u T (Ai, CZ^Gt^) are equal to the power and mutual information generated 
by u T ,Bt), respectively. Then note that Pt,t(7^-) is the infimum power over all admissible (A,C_,Gt) 
with A £ ]R( T + 1 ) X ( T + 1 ) according to Definition [TJ and that Pt{1Z) is the infimum power over all admissible 
(Kr T \ Bt) according to d29l i. subject to rate constraints in (l36l > and d42l respectively, which implies that to show 
iii), it is sufficient to show that the equivalent inputs (or equivalent input sequences) hold the rate constraint in 
Definition [T] By the construction of the sequence of admissible triples {{Ai, (2i>!E/T,i)}£i> it is straightforward to 
see that the rate constraint (i.e. I(W\ y T )/(T + 1) > 1Z) can be satisfied, yielding Pt{1Z) — Pt,t(JZ) and hence 
C T (V) =C t ,t{V). 

On the other hand, one can directly prove Or CP) = Ct^^P') without resorting to Pt,t(T^)- To this aim, use 
an arbitrarily small reduction e > from the power budget V, that is, consider only those (kP ,Bt)<l such that 
E II« T || 2 /( T + 1) <V -e. As e vanishes, those (K ( P,B T )e can arbitrarily approach C T (P) or C t ,t{V). Now 
fix (kP ,Bt) £ , and construct an admissible sequence (Aj, O^f/T^e as ii) does. Clearly, for sufficiently large i, 
the input w T (Ai, C_ i: Gr,i)e satisfies the power constraint E||u T || 2 /(T + 1) < V. That is, for any rate achievable 
by u T (KP ,Bt)c, it can be arbitrarily approached by u T (Ai, C i; Gr,i)e satisfying the power constraint. Then the 
result follows. 

Appendix IV 
Proof of properties of the coding structure S* 

1) Proof of Proposition^]; Here we show that the coding structure §*, in the form of (l62l . satisfies the 
necessary condition for optimality as presented in Proposition [4] 

Since {y t } is interchangeable with the innovations process {e*}, in the sense that they determine each other 
causally and linearly, it suffices to show that ~Eute T = 0. Note that 

u t = DX* = M'AX^ - B'L^et-i, (121) 

and thus 

E Mt e t _i = ED / AX t _ 1 e t _i-D'L t _ 1 A- e , t _i 

( = } EB'AX t _ 1 x;_ 1 C + El'^ t _ 1 iV t -i-D'AS i _iC i 122 ) 
= WASt-iC + - B'ASViC = 0, 

where (a) follows from (l62l and (|64] |. Similarly we can prove Eu t e r = for any r < t — 1. 

2 ) Proof of Proposition^ We first prove a simple technical lemma that is useful in the following development. 
It says that the product of a banded, lower triangular matrix with bandwidth (m + 1) and an upper triangular matrix 
is banded in its lower triangular part with bandwidth (m +1). 

Lemma 4. Suppose A S R" xn is banded and lower triangular with bandwidth (m+ 1), i.e., A(i. k) — if i < k 
or i > (k + m). Suppose B £ R" xn is upper triangular, i.e., B(k,j) = if k > j. Then C := AB is banded in 
its lower triangular part with bandwidth (m + 1), i.e., C(i,j) = if i > j + m. 

Proof: Simply note that C(i,j) = J2k=i -^-(^ k ) B ( k iJ)- So if i > j + m, then A(i, k)B{k,j) = for any k. 

□ 
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Now we go back to Proposition [5] To show this proposition, it is sufficient to show that is banded in its 



n 



lower triangular part with bandwidth (to + 1), since K y T ^ is symmetric. By Lemma|4j we only need to show that 
Ky can be written as the sum of products of banded, lower triangular matrices (of bandwidth (m + 1)) with 
upper triangular matrices. To this aim, some algebra shows that 

en t u t ' = vn t ((i -GtZt 1 )^ ~g T N T y = -G' T 

Z p , T Eu T £> = Z p , T KlZ' PiT +Z p , T Eu T N T 'Z' ZtT ^ 

= Zp^K^Z'p T + Z Z ^Z' Z t + Zp^^l^ N^ 1 Z' z T — Z z ,tG' t Z' p T 
= Z p ^ T 'Eu T y r ' + Z ZtT Z' z T - Z ZtT G' T Z' PjT . 

As Z z x is lower triangular, Eu T y T ' is upper triangular. Therefore, on the right-hand-side of the last equality, 
Zp.T and Z Z: t are banded and lower triangular with bandwidth (m + 1), and Eu T y T/ , Z' z T , and G' T Z' p T are 
upper triangular. Then the result follows. 

As an alternative proof or a verification of the above result, let us consider the mapping -M e ,g from e T to y T 
(incorporating the feedback loop). Since 

4 T) = MztKjpM'^ (124) 

(T) 

and K y > 0, M.e,y is lower triangular and uniquely defined (cf. [23] for relevant discussions of innovation 
processes, QR factorization, and Cholesky factorization). It is sufficient to show that A4e,y is banded and lower 
triangular with bandwidth (to + 1). To this aim, we characterize M.e, y in state space. Note that the state-space 
representation from y T to e T is (F — L 2 f H', L 2 u ~ ii'i 1) an( l hence the one from e T to y T is (F, —L 2 *, —H', 1) 
(see Appendix II- At . and that the state-space representation from y T to y T is (F Z ,G_ Z ,H^ , 1) (i.e., this is the state- 
space for the transfer function Z z (z)). Hence, we obtain the state-space from e T to y T is 



F \ (-U t 



,(-H',H'),l ; (125) 



-Q. z M! f z J\ G z 

where F z := F + G Z H_' G M mxm is a nilpotent matrix, that is, = and only the (1, m)th entry of is 
non-zero (equal to 1). The above state-space realization is not minimal. Simple computation shows that the above 
2mth order representation is equivalent to the mth order representation (F z ,G et ,H_',l) where G et := G z +L 2 f 
From the relation between the state-space representation and the impulse response, it holds that the (z, j)th entry 
in Aie.y is ii'-F , * -J ' -1 G e if i > j- Notice that JAe,jj(i, i) = 1. Because F z is nilpotent matrix, we have 
■M.e,y {hj) = if (i — j — 1) > m, namely, the lower triangular part of A4e,y is banded with bandwidth (m + 1). 
Thus, K y T ^ is banded with bandwidth (m + 1). 

A. Proof of Proposition [6] 

We follow the following steps to prove the lemma. First, by considering the equivalent open loop, non-feedback 
communication problem, we show that in order for an r T to achieve Cx.n, it must hold that the effective channel 
input (J— GtZt 1 ) waterfills the effective channel noise (I -g T ZT 1 y 1 Z T K T ■ Second, we show this would 

(T) (T) 

yield that, if the optimizing has rank k, then the optimal channel output covariance matrix K~ has its 

smallest (positive) eigenvalue, denoted A , repeated exactly k times. Therefore, — X I) has rank (T + 1 — k). 

This in turn results in that 

M := K { P - X Z P , T Z' (126) 



has rank (T + 1 — k). However, since Kf ] and Z p ,t2' p ,t 

are banded with bandwidth (2m + 1), it can be shown 
that M has rank at least (T + 1 — to). So k has to be no larger than to. The details follow. 
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First, we prove that the effect channel input needs to waterfill the effective channel noise to achieve Cr,n- Since 

f = Z T y T = (I- GtZt 1 )- 1 ^ +Z T N T ) (127) 

and 

K^) = (/ -grZiYWpV -GtZt 1 )- 11 + (/ -QtZ^Y^GtQW -GtZt 1 )' 1 ' ~P r +Pz (128) 
where P r and P z are defined in obvious way, the optimization problem for CT,n{V) can be recast as 

CtA'P) = sup -^—\ogd C t{I -GTZ^r 1 ^ +z T z' T ){i-g T z T 1 )- 1 '. 

AeR("+ 1 )x(™+ 1 ),c,s r J + 1 ~ (129) 

s.t. trace(P r +p z )/ (t+i)<v 

If the optimizing Gt* is plugged into the above optimization problem, noticing the resulting Pz is independent 

(T) 

of the choice of iv£ , we end up with the following optimization problem 
CtAV) := sup 1 logdet(J - GrZAr'iKP +Z T Z' T )(I - GtZ^Y 1 ' ■ 

AeR(»+i)x(«+ 1 ),C,0. r ,(Al) Z \ 1 + L ) 

s.t. trace (P r )/(T+i)<v 

(130) 

This may be viewed as a finite-horizon non-feedback capacity problem, in which the effective input to the channel 
without feedback is (I — Gt%t )~ 1 T^ \ an( l tne effective channel noise is (I - GrZT 1 y 1 Z T N T . This idea has 
been used in [33], [45], [51]. Thus, in order to give rise to the maximum mutual information between the effective 
input and the channel output y T , it is necessary to have the effective input to waterfill the effective noise. 

The implication of the waterfilling argument is that, if the (effective) input covariance matrix has rank k, 
then the output covariance matrix must have its smallest eigenvalue repeated exactly k times. This is intuitively 
suggested by the name "waterfilling". In other words, if the waterfilling level (cutoff value) is Ao, then there are 

(T) 

k eigenvalues of K ~ that are equal to Ao and are the consequence of "waterfilling" by k positive eigenvalues 
of (J — GTZ^ 1 )^ 1 Kr F \l ~ GtZ^ 1 ) 1 ' or equivalently and the rest of the eigenvalues are strictly larger 

(T) 

than Ao that remain unchanged after the waterfilling. Hence, if the optimizing K L has rank k, then the resulting 
(K { p - XqI) has rank (T + 1 - k). Since Z PtT has full rank, it holds that 

M := K [ p - \ Q Z p , T Z' p . T = Z p .t(K ( P - \ I)Z^ T (131) 

has rank (T + 1 - k). Since i^ T) and Z p , T Z' p T are banded with bandwidth no larger than (2m + 1), we have that 
M is banded with bandwidth (2m + 1). By (/o + go) = 0, it holds that Z p .t(j + rn,j) = for any j. Therefore, 
M(j + m, j) = Ky( T )(j + rn,j). However, from the proof of Proposition [5] we have 

KP = Z p , T Eu T y T/ + Z ZjT Z' z<T ~ Z z , T G' T Z' PtT) (132) 

which leads to that K,^t) (j + m,j) = /o ^ for any j; notice that Gt is strictly lower triangular and Z p ^(j + 
m,j) = 0. Then the banded structure of M implies that the rank of M is at least (T + 1 — m). This immediately 
follows that k has to be no larger than m. This proves Lemma [2] 

Now we go back to Proposition [6] For any mth order channel Z(z), consider the following perturbation that 
leads to an (m + l)st order channel: 



Z e (z) 



Z p (z) 

(1 - e)(l + fm-iz- 1 + ■■■ + fiz- m+1 + f z' m ) - e'z-™- 1 (133) 

1 + (fm-1 + ffm-l)^ 1 + ••• + (/!+ 9l)z- m+1 + (/o + go)?-™ ' 
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where e > and i is an integer to be determined. In other words, Z z (z) is perturbed to be Z z<c {z). Consequently, 
Zp,e,T = (1 — e)-Zp,T — where J is the down-shift matrix with J(j + m + 1, j) = 1 for any j and all other 
entries equal to 0. 

The number i is chosen such that for sufficiently small e > 0, we have Z z ,tZ' z t > ^z,eZ z e - Such an i always 
exists. To see this, note that the difference of the two covariance matrices is 

e(2 - e)Z z<T Z' ZjT + e '(l - e){JZ' ZjT + Z Z . T J') - e 2i JJ' 
> e(2-6)Z z , T Z' z . T + e*(l-e)(JZ' z . T +Z z , T J' - JJ'). 

As Z Z ^Z' Z t is positive definite and (JZ' Z T + Z z ,tJ' — JJ') is symmetric, the above difference admits simulta- 
neous diagonalization, which transforms both terms to be diagonal. That is, there exists S non-singular such that 
the congruence transformation using S leads to 

e(2 -e)Di + e *(l - e)-D 2 (135) 

with both D\ and D 2 are diagonal and independent of e. Suppose p := maxj \D2(j, j)/Di(j, j)\. Then for any i 
such that 

*>1 + ^S (136) 
loge 

the above difference is positive definite. However, for sufficiently small e, the right-hand-side of the inequality 

(T) (T) 

approaches 1, so it is sufficient to choose i := 2, independent of e. Hence K z > K z . Similarly we can show 

K ( P >K { P (137) 

if ei < e 2 . 

Next we show that as e > approaches zero, Pt(7^, 2c,t) admits a limit which is no larger than Pt(1Z,Zt)- 

(T) (T) 

Due to K z > K z , the feasible set 4>z,t,k for Zt is strictly contained in <j>z c ,T,n- Using the ordering in 
( 11371 ) and the capacity inequality proven in [15], it is seen that Pt(^, Z € ,t) is no larger than Pt(1Z,Zt) and is 
non-decreasing as e approaches zero from above and hence the limit exists. 

Consider an arbitrarily small slack 5 > of rate, i.e., consider Pt{TI — S,Zx)- For any 5 > 0, there exist e > 
such that the feasible set <pz e ,T,K is contained in 4>z,t,tz-s- To see such an e always exists, note that 

det (7 + T T (K%] ) ~ ^ ) = det {I + T T {K { p y 1 ^ + A t ), (138) 

where A c > and vanishes as e tends to zero. Thus if 

^logdet(7 + T T {K { p c )- 1 T' T ) > K (139) 



then 



ilogdet(7 + r T (if^ ) )- 1 r^) > K - S (140) 



for small enough e. 

Interconnecting the optimizing r c T and Q € ,t (obtained for Pt(7Z, Z e t)) V\ with Zt, we see that the intercon- 
nection satisfies the rate constraint for Pt(71 ~ 5,Zt)- The consumed power becomes 

trace[(7-C?e,T-2^)- 1 (7C( T e ) + Q^ T K { Pg^ T )(I - G^ T Z^ T % (141) 

17 To be more rigorous, an arbitrarily small slack to Pt(TZ, Z e ,r) may be needed since this optimization problem is an 
infimization problem as opposed to a minimization problem. The idea in the proof can be easily adopted when the slack is 
used and the same result holds. 
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which is greater than 

P T (ll,Z t , T ) = trace[(7 - Q ^Z^y 1 {K^J +0^rK^Q' €tT ){I - Ge, T Z^ T ')] (142) 

but the difference dependent on (K z — K z ' ) vanishes as e goes to zero. Consequently, as e goes to zero, 
the consumed power of the sequence of interconnections converges to Pr(lZ,Z ei T), no larger than Pt(1Z,Zt)- 
However, each optimizing KT'j has rank no larger than (m + 1) by Lemma [2] Therefore, Pt{TZ,Zt) can be 
achieved by a sequence of K^} with rank no larger than (m + 1). Thus we have proved the proposition. 



Appendix V 

Proof of Proposition [8j Convergence to steady-state 

We show that system d62l converges to a steady-state, as given by d92l . To this aim, we first transform the 
Riccati recursion into a new coordinate system, then show that it converges to a limit, and finally prove that 
the limit is the unique stabilizing solution of the Riccati equation. The convergence to the steady-state follows 
immediately from the convergence of the Riccati recursion. 

Consider a coordinate transformation given as 



A := *A* 1 
where 



A 
F 



C := * 



c + 4>'h 

H 



-l'n 



tt := Vt t 9', 



'In+l 
-l/) I„ 

and i/> is the unique solution to the Sylvester equation 



(i.e. * - 1 



In+l 
1p Im 



Fi)-i>A = -GC'. 



(143) 



(144) 



(145) 



Note that the existence and uniqueness of ip is guaranteed by the assumption on A that A$(— A) + \j(F) ^ for 
any i and j (see Section [V- Al l. 

Assume k = n for the rest of the proof; i.e., A is anti-stable. For the case k < n, we can further transform A, 
C, and I! into A, C, and E such that 

A = diag[A + ,A_,F], (146) 

where A + e IR(' £ + 1 ) X (' £ + 1 ) is anti-stable and A is stable; then the following argument can be easily modified for 
the case k < n. 

we can further decompose A into block-diagonal form and incorporate the stable block of A into F, and the 
proof follows similarly; note that A does not have any eigenvalues on the unit circle. 

The transformation defined in ( 11431 ) transforms A into a block-diagonal form with the unstable and stable 
eigenvalues in different on-diagonal blocks, and transforms the initial condition Eo to 

-</>'" 



En :=* 



In+l 





-tj) Iplj)' 

Therefore, the convergence of d65l l with initial condition Eo is equivalent to the convergence of 

AE t CC'E t A 



E t+ i = AEtA' 



with initial condition Eo- By [47], E t would converge if 

det 



c's t c + 1 








" 




'In+l 




( 






-E 


) 




.0 




x 22 _ 





(147) 



(148) 



(149) 
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where X22 is the negative semi-definite matrix to the discrete-time Lyapunov equation 

X22 = FX22F' -{C + ip'H)(C + rp'R)'. 
Notice that (C + rp'l]_) is the upper (n + 1) x 1 block in C. Since 



(150) 



det 




/ 



I 

x 22 \ 



det 



-/ VX22 ' 

V> I wx 22 

= det(-J) det (/ - WX 22 + WX22) 
* 0, 



(151) 



we conclude that E f converges to a limit Eoo. 

This limit Eoc is a positive semi-definite solution to 



AE—A - 



E^A 



C E c 



1 



(152) 



By [23], (I1521 l has a unique stabilizing solution because (A, C ) is observable (noting that (A,C') is observable) 
and A does not have any eigenvalues on the unit circle. Therefore, Eqo is this unique stabilizing solution, which 
can be computed from J 152b as (see also [47]) 

in 0" 




(153) 



where En is the positive-definite solution to a reduced-order Riccati equation 

11 11 {c' + h'i/>)£ 11 (c' + H'i/,y + i 

and has rank equal to the number of anti-stable eigenvalues of A (cf. [47]). Thus, E t converges to 

" E n Euf " 
_V>En ^EiiV. 

with rank equal to the number of anti-stable eigenvalues of A. 
ii) Immediately from i). 



(154) 



(155) 



Appendix VI 
Proof of Corollary Q] 

Consider the coordinate transformations used in the proof of Proposition [8] that transform A, C, and E into 
E into A, C, and E. Note that the block in Eqo (i.e. the solution to the Riccati equation defined by A and C) 
associated with the A block is zero. By Proposition |£l in the new coordinates the rate and power due to the A 
block are both zero, and hence in the original coordinates the rate and power due to the stable eigenvalues of A 
are both zero. Then we remove the dimensions corresponding to A in A, C, E, and the coordinate transformation 
matrix. It is easy to check that this leads to a pair of reduced order (Ak,C_ k ) with A k anti-stable and satisfying 
R°c,n(A,C) = Roc,k(Ak,Q k ) and Poo. n (A,C) = PooAA k ,C k ). 
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